SSAS -- to Partition or not?

by Rob 20. January 2010 01:16

Recently I had a question from a client whether there was really any performance improvement in SSAS cube processing when using partitioning strategies vs. just using a single partition.

While intuitively I "knew" that partitioning cube fact tables results in better parallelism and thus increased performance (when adequate CPU, memory and I/O resources can support it), I realized I really didn't have any empirical "proof" to backup my professional opinion.  So--I setup a simple test.

The following is a very simple comparison of processing the same data set with a cube design that's identical--except for the introduction of multiple partitions. For this test I used Analysis Services 2008R2 x64 running on a 4-core Xeon server

The data set processed includes 37 million fact rows, and several dimensions--the largest including around 4,000 members.  In the first test, a single partition is used.

In the second, the cube is partitioned along months, resulting in about 1.3 million rows per partition.  This is below the recommended threshold for partition sizing, but provides a decent (if basic) evaluation of the effect of parallelism on cube processing.

Results:

  • Time to process the cube using a single partition: 21 minutes
  • Time to process the cube using 23 partitions: 12.7 minutes
  • Bottom line: by partitioning the processing of the cube, there's a time savings of around 40% in this case
Observation: The server I used is in my lab and has 4 Xeon cores, 4GB RAM, and 1.3TB RAID 1E storage. During the single partition processing, memory demand didn't exceed 3GB, and CPU utilization didn't exceed 60% or so.  With partitioned processing, both CPU and memory were completely saturated.  When I get the chance (and the required loose change) to upgrade the server to 8 cores and more memory, I'd like to re-run this to see whether increased resources would provide an even greater benefit.  However, having at least this much empirical evidence is a good start.



Single partition processing

23 Partitions processing

Tags:

Analysis Services | SQL Server 2008

TechEd 2009: Microsoft BI - Gemini

by Rob 11. May 2009 23:57

Today I had the pleasure to attend Donald Farmer and Kamal Hathi's session on the Gemini project.  I've seen several of Mr. Farmer's presentations on BI and Data Mining, and I'm never disappointed at his skills and depth of knowledge!

Gemini is an in-memory, Excel-based, analysis services orientated technology that brings the power of dimensional modeling to a primarily Excel skilled audience--with the promise that the output of Gemini can be forward-engineered into managed Analysis Services solutions. 

Donald's quite entertaining slide deck was a major departure from the typical corporate spin that we're accustomed to seeing (and that I'm personally accustomed to producing) when we educate potential technology users.   

The slide deck casts the analyst, boss and IT admin as silent movie characters, where the analyst and IT professional struggle to keep the boss happy within the limitations of their skills and the existing content in the corporate DW.  It's the all-too-familiar "I need that analysis by tomorrow" versus "it takes time to put that data together".  Terribly apt and consistent with what we see in the "BI trenches" every day.

The silent movie theme stresses that this situation hasn't changed that much in many decades (and in any event in the two decades I've been working on BI solutions).  Of course, the slides were endlessly entertaining while driving home these points!

As for the "Gemini" product, it's quite exciting.  BlueGranite has been in the business of helping our clients' analysts pull info together quickly and make it meaningful since the ProClarity days.  One of my lingering concerns has been that--since ProClarity was acquired and the future of its core technology became uncertain--what tools within the Microsoft product suite would serve these "data diggers"? 

Excel pivot tables against SSAS cubes are fantastic--but how to make sure analysts can get data pulled together by the cube designer fast enough to meet business needs?  If the corporate governed DW/Cubes evolve slowly, how can analysts fill in the gaps? 

Today I still recommend ProClarity for the deep data analysts (as a complement to Excel), as it still provides more flexibility and richer visualization than Excel pivot tables for advanced analysts.  But I think Gemini may one day (hopefully soon) deliver even more, and really become the data digger's tool of choice.

Gemini is clearly intended to fill the analyst gap.  And in so many ways I think it will. It's at once a more approachable way for non-OLAP users to build rich dimensional models, a way to make more dynamic data integrations using the familiar Excel environment, and a rich OLAP query tool to be used against these analyst-generated models.

In truth, I can easily see even seasoned SSAS pros (including myself) using Gemini during prototype and early development, in addition its intended less sophisticated audience.

The things I love about Gemini:

1. Users import data from all types of sources:
  
a. Structured relational
  
b.
Existing OLAP Cubes
  
c.
Subscribe to "Service Documents" (a form of RSS feed that contains tabular data)
  
d. Paste in any tabular data (e.g. copied to clipboard from a web page)

2. Fast processing of large data volumes (100M rows demonstrated, 20M demo'd on a netbook)

3.  Ability to add calculated measures at the Excel pivot-table layer (sweet!)

4.  Ability to connect to Gemini models as SSAS data sources (rocks!)

5.  Tight integration with SharePoint as a basic architectural construct

6.  Translation of most OLAP concepts to Excel terminology more familiar to analysts

Some things I would like to see improved, or clarified. 

I would have asked these questions, but Q&A was cut short today due to time constraints...

1. As simple as they seem to MDX people (like me), my sense is the DAX expressions are going to be too complex for many of the analysts I train.  It reminds me of PPS-P PEL, in that the expressions are intended to be simplified from MDX, but they're still complex and require multidimensional thinking--which isn't a gift many of us are born with.   PEL generated lots of push-back from analysts when I demonstrated it to "real customers", and I fear DAX may as well.

2. To address #1, I hope the product team will consider following the model ProClarity set with it's KPI Designer, which allows users to use wizards to build calculated measures such as ranking and bubble-up exceptions.  KPI designer users build complex MDX without knowing that's what they're doing.  I still train new users on these tools, and it's a positive for them to use wizards (rather than purely language constructs, as with DAX).

3. I'd like to see some MDM tie-in to ensure that already accepted calculations, data sources, etc., can be drawn upon and re-used in a Gemini solution.  I can see a BI governance issue (and IT objection!) if many analysts are building silo BI solutions without some centralized baseline to start from.  Gemini doesn't prohibit a DW/MDM baseline, but it doesn't appear to promote one either.

4. Security really wasn't addressed in the session.  If analysts will be pulling 100M rows of fact data into a desktop solution…how does a corporate IT policy ensure that that such huge volumes of data isn't lost in the back of a taxi?  BitLocker would be a convenient answer to this question, but I hope it isn't the only one.

5. Data mining--not sure if it's possible to incorporate DM models into the Gemini models, but the combination of capabilities on the desktop would be really fantastic!

In-memory OLAP is a hot technology, and as illustrated by Donald's slides, the world has been waiting far too long to put such powerful tools in the hands of typical analysts.  I can't wait to see how this possibly disruptive technology impacts the wide swath of users its intended to benefit!

Tags: , , ,

Analysis Services | Analytics

SSAS 2008 Deployment: The connection either timed out or was lost

by keruibo 16. April 2009 01:32

The following issue is a problem in SSAS that you might run into either in test or deployment environments (see references). 

In a nutshell, when a client session running Vista or Windows 2008 talks over the network to an SSAS server running on Windows 2008, and they use Kerberos for authentication, there likely will at some point be connectivity problems that resemble timeout or firewall blocking problems. 

Errors might look like one of the following:

The connection either timed out or was lost.
Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
An existing connection was forcibly closed by the remote host

The two places we’ve observed these problems are:

1.       MDX queries submitted to the server that are reasonably large, such as those PPS submits, or perhaps those coded by hand that become large.  “Large” might be something like 1000 bytes or so.

2.       Deploying databases over the network from a Vista/2008 client to a 2008 server.

This problem still exists as of SQL 2008 SP1, so applying that update isn’t a resolution. This problem definitely exists when client & server are on different machines; may not be a problem when client/server are on the same computer (e.g. VPC or BIDS on a remote desktop session). 

From what I understand, MS is aware of this and we should expect a post-SP1 CU soon.  Whether this is to be resolved as a Windows patch or a SSAS patch is unclear to me. 

The following are work-arounds in the mean-time:

1.       If either the server or client is running XP or Windows 2003, the problem should not occur in any case.

2.       If you can edit the connect string (e.g. SSRS/Excel), specify ;SSPI=NTLM as a parameter.    Be advised that user security delegation will not succeed if this workaround is used.

3.       Connect with the IP address instead of machine name or FQDN.  Since Kerberos is unsupported with IP address connections, this also forces a fallback to NTLM.  Kerberos delegation also will not succeed when this workaround is used.

 Of particular note is that if SSAS is to be deployed on a Windows 2008 server, all available workarounds currently imply kerberos delegation will be impossible (a short-term driver for SSAS deployment on Server 2003?). 

 References:

http://blogs.msdn.com/psssql/archive/2009/04/03/errors-may-occur-after-configuring-analysis-services-to-use-kerberos-authentication-on-advanced-encryption-standard-aware-operating-systems.aspx

http://denglishbi.spaces.live.com/Blog/cns!CD3E77E793DF6178!1214.entry

http://sqlblogcasts.com/blogs/drjohn/archive/2009/03/28/kerberos-kills-large-mdx-queries-on-windows-server-2008-ssas.aspx

Tags: , , ,

Analysis Services | Configuration | Security

Scripting Analysis Services Database Backup

by keruibo 30. August 2008 20:19

There are a number of ways to automate backups of Analysis Services databases.  Some examples I've seen: SSIS Packages, PowerShell, SQL Agent jobs with XMLA. 

Here is a "quick and dirty" way using a .vbs cscript that can be called form the operating system, or scheduled in SQL Agent as a cmdexec step. This script will backup all Analysis Services databases programatically (i.e. if someone adds a new database, it will automatically be backed-up without further operational changes).

I'm not saying this is the best way to do this, and the script below completely lacks error trapping.  But it is effective, simple to implement, and may form a good base for a robust script in your environment.

 

 '-------------  Get the target folder from the command-line  -----------
 strBackupPath = WScript.Arguments.Named.Item("folder")
 WScript.echo "Destination folder: " & strBackupPath 

 if strBackupPath = "" then     wScript.Echo "'folder' parameter resulted in a blank string"     WScript.Echo "Usage: cscript BackupSSAS.vbs /folder:<full path to backup folder>"     WScript.Quit 1  end if

 strBackupPath = strBackupPath + "\"

 '-------------- Get TimeStamp in form YYYYMMDDhhmmss  -------------  MyTimestamp = CStr(Year(Now()) * 10000 + Month(Now()) * 100 + Day(Now())) _    + CStr(Hour(Now()) * 10000 + Minute(Now()) * 100 + Second(Now()))     '-------------  Connect to SSAS Instance  --------------------  Set ssas = CreateObject("Microsoft.AnalysisServices.Server")  ssas.Connect("localhost")

 '-----------  Backup Databases   -------------------------  For Each MyDatabase In ssas.Databases      MyDatabase.Backup strBackupPath + MyDatabase.Name + "_" + MyTimestamp + ".abf"  Next    '------------  Disconnect from SSAS Instance  -----------------  ssas.Disconnect()

Tags:

Analysis Services

Analysis Services 2008 Improvements & BIDS Helper

by Rob 27. April 2008 23:33

Analysis Services cubes for OLAP analysis have improved substantially between SQL Server 2000 and 2005.  Still, there's always room for improvement.  The recent CTP (Community Technology Preview) editions of the upcoming SQL Server 2008 product shows that the product continues to improve, and that improvements align very well with what customers and developers want and need.

Some of my favorite new capabilities and features with SSAS 2008:

The dimension design improvements will be especially helpful for those new to AS cube development.  The current tools are excellent, and the available wizards do a great job getting new developers going in the right general direction.  Yet, as with many tools, the initial feeling of confidence can be misleading as a generic, wizard-driven design begins to crack under the strain of production data volumes and usage patterns.

The new dimension designer enhancements are meant to address this situation, with the goal to help designers make performant cubes from the very beginning.  Of course, there are things that can be done even in the 2005 release to improve the quality of designs.  The easiest to use tool is BIDS Helper, a freeware set of add-ins for BIDS.

Where BIDS Helper will give good advice, and should be part of the tool-box for most OLAP developers using SSAS, only a thorough understanding of what's happening under the hood will help achieve the best design and performance.  One of the best sources for this info is the Microsoft SQL Server 2005 Analysis Services Performance Guide provided by Microsoft.  A must-read for any serious SSAS OLAP developer.

Tags: ,

Analysis Services

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2010 Rob Kerr's BI Blog