SQL2014EE ‘Insufficient System Memory’ message

While preparing a data load demo for my presentation at SQLSaturday 317 in Oslo I experienced an hickup with SQL2014 EE: although the server in use for the comparison has 200+GB of RAM assigned to a freshly installed SQL instance, the Bulk inserting command to load a ~75 GByte flat file into an newly created Clustered Columnstore Indexed (CCI) table failed with the following error message:

Msg 701, Level  17, State: 128, line xx
There is insufficient system memory in resource pool ‘default’ to run this query.

There are a lot of posts out there on this specific message but I didn’t find good insights, so hopefully my attempt helps you to find a resolution: try and execute the ‘ALTER RESOURCE GOVERNOR RESET STATISTICS’ command and check again; for me this was sufficient to clear the message an load the data successfully afterwards. If your interested in digging a bit deeper into the Resource Governor values and settings, continue reading!

The test commands:

Getting ready for SQLPass!

The Message in the SQL Server Log is slightly different than the one returned in SSMS:
Error: 701, Severity: 17, State: 128
There is insufficient system memory in resource pool ‘default’ to run this query. 

Thinking about this message for a minute, my first thought was, well this could make sense; the default settings of SQL Server will assign a maximum of 25% memory granted to a query. Would it be the case that we simply don’t have enough memory assigned (25% of 200 GB  == 50GB) to run this task? But this isn’t SQL-alike behavior: you would expect worst case, that SQL Server would start to swap data out to disk and report a lot of time spend on some sort of waitstat as the new ‘COLUMNSTORE_BUILD_THROTTLE’ instead?

So just to look for an easy fix, I enabled the Resource Governor and increased the Memory Grant Percentage to 50% via the GUI, followed by 60%, 80% and finally with the value set to 95% the bulk insert task completed and loaded the 75 GByte of data from flatfile. This ‘trial & error’ approach made me feel a bit uncomfortable; do I really need 2.5x the amount of memory? I never experienced any such problem with direct data loads of easily 20x this size directly into CCI tables in SQL Server APS. So time to dig deeper…

 

So,  to run some investigations I reverted the RG settings back to the default 25% and disabled it. Then noticed that the DMV’s related to the Resource Governor have been enhanced significant in SQL2014; new counters like ‘read_bytes_total’ and ‘write_bytes_total’ have been added to the sys.dm_resource_governor_resource_pools dmv. (for a full description click here).

Snapshotting the Resource Governor Resource Pool values

To take a quick snapshot of the current values,  just copy them into a temp table:

--Snapshot RG Resource Pool stats:
USE tempdb
SET NOCOUNT ON
IF OBJECT_ID('tempdb.dbo.__RG_resource_pools') IS NOT NULL BEGIN
    DROP TABLE tempdb.dbo.__RG_resource_pools
END

SELECT GETDATE() as Date, *
INTO tempdb.dbo.__RG_resource_pools    
FROM sys.dm_resource_governor_resource_pools 

 

and then the fun part begins;  watching the various values to increase over time is the interesting part , which values actually correlate best with the bulk insert task?  So I ended up primairly looking at the delta values of following fields;

 

List the Resource Pool delta values over time

SET NOCOUNT ON 
SELECT 
    b.Date as Snapshot_Taken,
    DATEDIFF(ss,b.Date,GETDATE()) As Seconds_Delta,
    a.name AS RG_Pool_name,
    a.statistics_start_time as Last_Statistics_Update_start_time,
    (a.read_bytes_total  - b.read_bytes_total)/1024/1024/1024 AS Read_GB_Delta,
    (a.write_bytes_total - b.write_bytes_total)/1024/1024/1024 AS Write_GB_Delta,
    --a.used_memgrant_kb - b.used_memgrant_kb AS Used_memgrant_kb_delta,
    a.cache_memory_kb - b.cache_memory_kb AS Cache_memory_kb_Delta,
    a.total_memgrant_count - b.total_memgrant_count AS Total_memgrant_count_Delta,
    (a.max_memory_kb/1024/1024) AS Max_Mem_GB,
    (a.used_memory_kb/1024/1024) AS Used_Mem_GB
    
FROM sys.dm_resource_governor_resource_pools a
JOIN tempdb.dbo.__RG_resource_pools    b ON a.name = b.name

Query Output

The DMV snapshot nicely shows the ‘Read_GB_delta’ column representing the size of the flat file being processed, and the ‘Write_GB_delta’ column shows the size of the created CCI cq. data written to disk!

image

What is really interesting here is that the input part, the 74 GB flatfile size, has been assigned to the Internal pool, where resource consumption is not restricted in any form, and the actual CCI data, 25 GB has been allocated to the default pool. 25 GB is a value which should have fit easily into the Default Group in the first place, right? So what happened?

But then, looking at the other columns I noticed that the ‘statistics_start_time’  value hadn’t changed when reverting the values to its original values.  According the article this value represents  ‘The time when statistics was reset for this pool.’, so I gave it a try and manually reset the RG statistics;

ALTER RESOURCE GOVERNOR RESET STATISTICS 

This indeed updated the statistics_start_time value, so I changed the value in my query above to : ‘Last_Statistics_Update_start_time’ to make it easier to interpret. 

After the reset, the data loading completed successfully, also with the default settings , so problem solved!

Doublecheck

Just to validate that SQL Server2014EE will indeed throttle but still complete the CCI build process properly, as a test I lowered the Memory Grant % from 25% for the Default group down to as little as 5% (== ~max. 10GB) and started the bulk insert one more time;

ALTER WORKLOAD GROUP [default] WITH(request_max_memory_grant_percent=5) 
ALTER RESOURCE GOVERNOR RESET STATISTICS 
ALTER RESOURCE GOVERNOR RECONFIGURE 

Surprisingly after the direct load into the CCI table completed succesfully,  not a sign of ‘COLUMNSTORE_BUILD_THROTTLE’ or any of the fancy new waitstats there; just the regular data loading ones!

Waitstats after bulk inserting into CCI with % mem. granted set to 5%.

Wrap-up

To overcome the ‘There is insufficient system memory in resource pool ‘default’ to run this query. ’ error message please check the sys.dm_resource_governor_resource_pools dmv values. Before increasing the Max. Memory Grant setting first give it a try to update the the Resource Governor statistics with the ‘ALTER RESOURCE GOVERNOR RESET STATISTICS’ command.

(And as always.. please let me know your findings!

- Henk)

GD Star Rating
loading...
GD Star Rating
loading...

How to Process a SSAS MOLAP cube as fast as possible – Part 2

Special thanks to Dirk Gubbels for his contribution on the data type optimization part and review!(Follow Dirk on Twitter: @QualityQueries).

In part 1 we looked at a method to quantify the work that gets done by SQL Server Analysis Server and found that the OLE DB provider with a network packet size of 32767 brings best throughput while processing a single partition and maxing out the contribution per single CPU. 

In this 2nd part we will focus on how to leverage 10 cores or more (64!) and benefit from every of these CPU’s available in your server while processing multiple partitions in parallel; hope the tips and approach will help you to test and determine the maximum processing capacity of the cubes on your SSAS server and process them as fast as possible!

Quick Wins

If you have more than 10 cores in your SSAS server the first thing you’ll notice when you start processing multiple partitions in parallel is that Windows performance counter ‘% Processor time’ of the msmdsrv process is steady at 1000% which means 10 full CPU’s are 100% busy processing. Also the ‘Rows read/sec’ counter will top and produce a steady flat line similar to the one below at 2 million Rows read/sec (==200K rows read/sec per CPU):

Flatline @ 2 Million Rows read/sec 

In our search for maximum processing performance we will increase the number to reflect the # Cores by modifying the Data Source Properties. Change the ‘Maximum number of connection’ from 10 into the # Cores in your server. In our test server we have 32 logical- and 32 Hyperthreaded = 64 cores available.

1) # Connections

By default each cube will open up a maximum of 10 connections to a data source. This means that up to 10 partitions are processed at the same time. See picture below: 10x status ‘In Progress- ’ for the AdventureWorks cubes which is slightly enpanded to span multiple years:

By default Maximum 10 connections

Just by changing the number of connections to 64 the processing of 64 partitions in parallel results in an average throughput of over 5 million Rows read/sec, utilizing 40 cores (yellow line)

This seems a great number already but its effective (5 million rows/40 cores =) 125K Rows per core and we do still see a flat line when looking at the effective throughput; this tells us that we are hitting the next bottleneck. Also the CPU usage as visible in Windows Task Manager isn’t at its full capacity yet!

Increasing the number of connections from 10 to 64 active connections. 

Time to fire up another Xperf or Kernrate session to dig a bit deeper and zoom into the CPU ticks that are spend by the data provider:

Command syntax:

Kernrate -s 60 -w -v 0 -i 80000 -z sqlncli11 -z msmdsrv -z oleaut32 -z sqloledb -nv msmdsrv.exe -a -x -j c:\websymbols > SSAS_trace.txt

Kernrate base trace

This shows an almost identical result as the profiling of a single partition in blog part I.

By profiling around a bit and checking on both the OLEDB and also some SQL native client sessions surprisingly you will find that most of the CPU ticks are spend  on… data type conversions.

zooming into SQLNCLI 

The other steps make sense and include lots of data validation; like, while it fetches new rows it checks for invalid characters etc. before the data gets pushed into an AS buffer. But the number 1 CPU consumer, CDataSource::DataConvert is an area that we can optimize!

(To download a local copy of the symbol files yourselves, just install the Windows Debugger by searching the net for ‘windbg download’  and run the symchk.exe utility to download all symbols that belong to all resident processes into the folder c:\websymbols\;

C:\Program Files (x86)\Windows Kits\8.1\Debuggers\x64\symchk.exe /r /ip *  /s SRV*c:\websymbols\*http://msdl.microsoft.com/download/symbols )

2) Eliminate Data type conversions

This is an important topic; if the data types between your data source and the cube don’t match the transport driver will need a lot of time to do the conversions and this affects the overall processing capacity; Basically Analysis Server has to wait for the conversion to complete before it can process the new incoming data and this should be avoided. 

Let’s go over an AdventureWorksDW2012 Internet_sales partition as example:

image

By looking at the table or query that is the source for the partition, we can determine it uses a range from the FactInternetSales table. But what data types are defined under the hood?

To get to all data type information just ‘right click’ on the SSAS Database name and script the entire DB into a new query Editor Window.

Search through the xml for the query source name that is used for the partition, like: msprop:DbTableName="FactInternetSales"

Script the entire SSAS DB for easy lookup of datatypes.

 
These should match the SQL Server data types; check especially for unsignedByte, short, String lengths and Doubles (slow) vs floats (fast).  (We do have to warn you about the difference between an exact data type like Double vs an approximate like Float here).

A link to a list of how to map the Data types is available here.

How can we quickly check and align the data types best because to go over them all manually one by one isn’t funny as you probably just found out. By searching the net I ran into a really nice and useful utility written by John Tunnicliffe called ‘CheckCubeDataTypesthat does the job for us; it compares a cube’s data source view with the data types/sizes of the corresponding dimensional attribute. (Kudos John!) But unfortunately even after making sure the datatypes are aligned and running Kernrate again shows that DataConvert is still the number one consumer of CPU  ticks on the SSAS side.

3) Optimize the data types at the source

To proof that this conversion is our next bottleneck we can also create a view on the database source side and explicitly cast all fields to make sure they match the cube definition. (This will also be an option to test environments where you don’t own the cube source & databases)

Maybe as best-practice CAST all columns even if you think the data types are right and exclude also the ones that are not used for processing the Measure group from the View. (For example, to process the FactInternetSales Measure Group from the AdventureWorks2012 DW cube  we don’t need  [CarrierTrackingNumber], [SalesOrderNumber], [PromotionKey] and [CustomerPONumber]) ; every bit that we don’t have push over the wire and process from the database source is a pure win.  Just create a view with the name ‘Speed’ like to give it a try.

Create a database view with all fields casted explicitly

(Note: always be careful when changing data types!

For example,  in the picture above,  using the ‘Money’ data type is Okay because it is used for  FactInternetSales, but Money is not a replacement for all Decimals (as it will only keep 4 digits behind the decimal point and doesn’t provide the same range) so be careful when casting data types and double check you don’t lose any data!)

Result: by using the data type optimized Speed view as source the total throughput increased from  5 to 6.6-6.8 Million rows Read/sec and 4600% CPU usage (== 147K rows/CPU).  That’s 36% faster. We’re getting there! 

The picture also shows that one of the physical CPU sockets (look at the 2nd line of 16 cores in Numa Node 1) is completely max’d out:

With the Data types aligned  an extra 1.6 Million Rows Read/sec are processed 

4) Create a ‘Static Speed’ View for testing

If you would like to take the database performance out of the equation something I found useful is to create a static view in the database with all the values pre-populated this way there will still be a few logical reads from the database but significant less physical IO.

Approach:

1) Copy the original query from the cube:

Doubleclick on a cube Partition

2) Request just the SELECT TOP (1):

SELECT TOP (1) From...

3) Create a Static view:

Add these values to a view named ‘Static_Speed’ and cast them all:

Cast all static values 

4) Create an additional test partition that queries the new Static_view

Add as many Partitions as the number of CPU's in the SSAS system

5) Copy this test partition multiple times

Create at least as many test partitions equal to the number of cores in your server, or more:

Script the test partition as created in step 4):

Scripted test partition

Create multiple new partitions from it by just changing the <ID> and <Name> ; these will run the same query using just the static view. This way you can test the impact of your modifications to the view quickly and at scale!

6) Processing the test partitions

Process all these newly created test partitions who will only query the statics view and  select as many of them or more as the number of CPU’s you have available in your SSAS server.

Determine the maximum processing capacity of your cube server
by monitoring the ‘Rows Read/sec’!

 

Wrap Up

If you have a spare moment to check out the workload performance counters of your most demanding cube servers you may find that there is room for improvement. If you see flat lines during the Cube processing I hope your eyes will now start to blink; by increasing the number of connections or checking if you don’t spend your CPU cycles on data type conversions you may get a similar of over 3x improvement, like shown in the example above. By looking at the Task Manager CPU utilization where just one of the NUMA nodes is completely max’d out might indicate its time to start looking into some of the msmdsrv.ini file settings…

GD Star Rating
loading...
GD Star Rating
loading...
Better Tag Cloud