How to increase the SQL Database Restore speed using DB compression and Solid State Disks

a couple of months ago I blogged about how to speed up a full database backup; but what about the restore speed? Indeed also an interesting topic to investigate. So I ran all kind of tests and found out that it is possible to restore a database at least 3 times faster provided that you have sufficient CPU’s and Disk IO capacity available.

The restore tests are executed on the same hardware as the previous posts; a 96 core Unisys ES7000 server with both a DSI 3500 and DSI 3600 Solid state storage unit attached. The database to restore is an 750GB Database with 642 GB of data spread across 96 files and 96 File Groups.

Monitoring a Restore Operation

SQL Server provides little information about the restore speed when a Full DB restore is in progress so lets rely on the regular windows perfmon counters like

- Disk Read+Write Bytes/sec to measure the throughput

- Average Disk Bytes per Read+Write to determine the IO Blockize

- % Processor time of the SQLServer process.

Step 1) The “Out of the Box” Restore performance

First step is to determine the effective available hardware I/O bandwidth on the server for both reading the backup file(s) and restoring the database files.

Easiest way to measure this is by using the SQLIO utility.  The restore command is using large IO’s ranging from 512 KB to over 1 MByte (up to 4) so lets run the SQLIO test with large blocks also. 

The SQLIO test shows we have a Random Read bandwidth available of over 3 GByte/sec and a Random Write bandwidth which is sustained over 2.74 GB/sec. This should be more than enough for our restore experiments.

Effective hardware R/W bandwidth

Step 2) Getting started: creating a Full database backup into a single destination file

To backup the complete database with compression (84267834 pages, or 642 GB) takes 2198 sec, which is 36 minutes 38 seconds. (I capture the exact durations with SQLProfiler monitoring the Master database). The backup creates a 317 GB flat file.

Starting the full backup through the SSMS GUI

Impression of the Backup speed: the database is being read with 290 MB/sec and written out to the destination file with 143 MB/sec. SQLServer is using approx. 4 CPU’s:

 image

Step 3) Restoring the database – Reading from a single file

To quickly check the content of a backup file you can use the SSMS GUI or use the following command:

— To display the content of a backup set:
RESTORE FILELISTONLY FROM
DISK = N’c:\DSI3601\LUN01\backup\TPCH_1TB_Full_1File’
WITH FILE=1;

Content of the backup set.

Through the SSMS GUI or with the following statement you can start the Restore operation:

RESTORE DATABASE [TPCH_1TB] FROM  
DISK = N’C:\DSI3601\LUN01\BACKUP\TPCH_1TB_Full_1File’
WITH  FILE = 1,  NOUNLOAD,  REPLACE,  STATS = 10
GO

Restore from a Single File

To restore the entire database takes 19 minutes and 46 seconds. Restoring the database is significantly faster than backing it up (which took 36 min.38 sec.)

The data is read from the flat file with 1 MByte IO’s and written to SSD with 512 KB IO’s using 6-7 cores;

Restore speed, reading from a single file

I logged the Average Disk Write Bytes/sec data into a Perfmon log and the counters show that we do not hit a hardware limit yet; the write throughput is a steady flat line which (typically) indicates that the throughput is limited by the application itself. Time to investigate our options!

Write speed flatliner - but not yet hitting a hardware bottleneck.

 

Restore Performance optimization tip #1

Writing out the Backup set to multiple destination files (“dump devices”) will speed up both the backup and restore process significantly.

To BACKUP the DATABASE to 9 files shows that 84267834 pages are backed up in 605.411 seconds (1087.430 MB/sec).

Restoring the database while reading from 9 backup files is speeding up the restore process; the write speed increases significantly from, on average, 600 MB/sec to 1479 MB/sec

and completes in 445.191 seconds using 18 CPU’s. This is 2.5x faster.

–Restore from 9 files with options
RESTORE DATABASE [TPCH_1TB]
FROM  
  DISK = N’c:\DSI3601\LUN01\backup\TPCH_1TB-Full’,
  DISK = N’C:\DSI3601\LUN01\backup\File2′,
  DISK = N’C:\DSI3601\LUN01\backup\File3′,
  DISK = N’C:\DSI3601\LUN00\backup\File4′,
  DISK = N’C:\DSI3601\LUN01\backup\File5′,
  DISK = N’c:\DSI3601\LUN01\backup\File6′,
  DISK = N’C:\DSI3601\LUN01\backup\File7′,
  DISK = N’C:\DSI3601\LUN01\backup\File8′,
  DISK = N’C:\DSI3601\LUN00\backup\File9′
WITH  FILE = 1,  NOUNLOAD,  REPLACE
, STATS = 10
GO

Impression of restoring the full db, reading from 9 backup files:

Full Restore, reading from 9 files

Restore Performance optimization tip #2

To ramp up the throughput even further experiment with specifying the BUFFERCOUNT value. This will bring the maximum throughput to our test.

Running @ Full speed the restore completes in only 5 minutes 40 seconds!

RESTORE DATABASE successfully processed 84267834 pages in 340.330 seconds (1934.423 MB/sec).

Specifying the BUFFERCOUNT value brings us another 454 MB /sec extra restore speed !

DBCC traceon(3605, 3004, 3014, 3213, -1)
go
RESTORE DATABASE [TPCH_1TB]
FROM  
  DISK = N’c:\DSI3601\LUN01\backup\TPCH_1TB-Full’,
  DISK = N’C:\DSI3601\LUN01\backup\File2′,
  DISK = N’C:\DSI3601\LUN01\backup\File3′,
  DISK = N’C:\DSI3601\LUN00\backup\File4′,
  DISK = N’C:\DSI3601\LUN01\backup\File5′,
  DISK = N’c:\DSI3601\LUN01\backup\File6′,
  DISK = N’C:\DSI3601\LUN01\backup\File7′,
  DISK = N’C:\DSI3500\LUN01\backup\File8′,
  DISK = N’C:\DSI3500\LUN00\backup\File9′
WITH  FILE = 1,  NOUNLOAD,  REPLACE
, BUFFERCOUNT = 2200
GO
DBCC traceoff(3605, 3004, 3014, 3213, -1)
GO

 

Impression of the various metrics while restoring @ maximum speed:

Impression of Full DB Restore, running at full speed

Observations

If you would like to gather more information about the backup and restore operations you can enable, (in your test environment only of course), the following trace flags: 
DBCC traceon(3605, 3004, 3014, 3213, -1). The information will be logged into the SQL Errorlog.

A lot of additional restore information is exposed by using some traceflags

Media Set BLOCKSIZE option

During the restore experiments I noticed that using the Media Set BLOCKSIZE option is a serious performance killer.

Also this option does not show up in the Log; I was assuming that this option would somehow affect some of the settings like the “MinMaxtranferSize” but it doesn’t. 

With the BLOCKSIZE set to 65536, the restore suddenly takes more than 8 minutes and the average write throughput is down to 1400 MB/sec ;-(

with the BLOCKSIZE set to 32768, the restore takes more than 10 minutes to complete.

In this PSS article I read that SQL2005 SP2 is capable of writing 1 MByte IO’s during restore operations to reduce file extension needs and NTFS metadata maintenance.

Increasing the MAXTRANSFERSIZE option seems to affect the write IO block size, it shows an increase from on avg. 600-800 KByte to 1MB or more, but during my experiments the mileage varied.

Sampling the SQL sys.dm_os_wait_stats  for 20 seconds:

image 

Explanation of the wait stats;

BACKUPIO: Occurs when a backup task is waiting for data or is waiting for a buffer in which to store data: unfortunately increasing the buffercount even further does not help to reduce this number 1 wait. Maybe adding more LUN’s will help.

DISPATCHER_QUEUE_SEMAPHORE:  Occurs when a thread from the dispatcher pool is waiting for more work to process.  

Zeroing out the log files at the beginning of the restore happens extremely fast with very large 4-5 MByte writes:

image 

 

Restoring on a File or FileGroup level

Besides the Full Database restore option, you can also restore on a File- or FileGroup basis. I ran a couple of tests  to see if these options will restore even faster.

Single FileGroup restore

To restore a single File Group (6.7 GB / 877928 pages) takes 7 minutes and 7 seconds (17.3 MB/sec). With Buffercount=1100 specified it takes only 3 minutes 12 seconds (41.2 MB/sec). A single FileGroup restore is using  up to 17 CPU’s so it wouldn’t be an option to restore multiple FileGroups in parallel to speed up the restore operation. Also because you cannot start 2 FG restore tasks in parallel:  When you start multiple FileGroup restore tasks you receive the message “Exclusive access could not be obtained because the database is in use”. Restoring them all in sequence 1 by 1 would have taken almost 5 hours.

But… you can specify All FileGroups to restore them all at once.   Restoring all 97 FileGroups (including the primary)  provide similar performance as a regular Full DB Restore:

Best result :  5 minutes 39 seconds. (RESTORE DATABASE … FILE=<name> successfully processed 84267834 pages in 339.174 seconds (1941.016 MB/sec)).

You can check the status of the Filegroup afterwards with the following command:  “SELECT name,state_desc  FROM sys.database_files” ;

If you forgot to restore one or more Filegroups, the “State_desc” status will be “RESTORING” instead of “ONLINE”.

Wrap-Up

The tricks to speed up a database Backup also apply to increase the Restore speed; simply add more backup destination files and specify a high Buffercount value.

This will bring you up to 3.3x faster Restores. In my tests the restore time of  a 462 GB Database decreased from roughly 20 minutes to less than 6 minutes.
Provided you have sufficient CPU’s (on average 23, up to 34 CPU’s are in use by SQLServer) and fast Solid State Storage to keep up with the IO throughput, SQL Server 2008R2 will leverage the available CPU’s and IO bandwidth very well to restore databases quickly. 

VN:F [1.9.17_1161]
Rating: 5.0/5 (10 votes cast)
VN:F [1.9.17_1161]
Rating: +11 (from 11 votes)

Reading as fast as possible from a table with SSIS (Part II)

Recently I blogged on how to be as selective as possible while reading from and OLE DB Data Source (Part I) and how to load the data from a single flat file as fast as possible into a SQL Server table. But you probably noticed that “Out of the box” SSIS reads data faster from a flat file than from a SQL table. In this article I will share my latest trick on how to speed up the reading from a single table, typically at least twice as fast!

Time to beat, the “Native” OLE DB source speed

The first step, like always, is to check the default throughput and duration by counting rows from the Native OLE DB Data Source.
In our case it takes 14 minutes and 6 seconds to read all approx. 180 million rows or 12.89 GB from a SQL Server table with 16 columns of data. For comparison; reading the same data from a single flat file takes only 7 minutes 57 seconds.

So where is this difference coming from? Primarily because of the IO Packet size that’s being used I would say. SSIS reads with 128 KB blocks from a flat file and uses by default 4KB (the default network packet size) which is changeable into max. 32 KB to request data from SQL Server (32KB is also the maximum for the new SQL2008 R2 version). To retrieve the data faster we have to find a way to handle more IO requests in parallel!

SQL Source Table properties 

To set the baseline just build a package and count the rows from the Native OLEDB Source. (Don’t forget to change the Packet Size from 0 into 32767 in the Connection Manager).

image

The Windows Performance Monitor counters to check are the CPU load of both the SSIS process (DtsDebughost when executing the package from within BIDS or DTEXEC when started from the command line) and the SQL Server process. Also check the amount of bytes we are reading: select the IO Read bytes/sec counter from the process. 

Throughput and CPU load reading from the data source 

Adding parallelism by using the Modulo algorithm

The optimization trick I would like to point out to you is based on using the Modulo arithmetic operator that can be used with the numeric data type. It returns the remainder of one number divided by another. For example, if you want to read with 3 parallel streams use can use modulo 3 ( % 3 ). You can pick up the 3 output streams by specifying as the remainder values = 0,1 and 2.  (Using value’s less than 0 or larger than 2 will return 0 rows when modulo 3 is used.)

You can check the output also by running the query in SSMS or with the BIDS Preview option from the OLEDB Source Editor.

select * from dbo.LINEITEM_Hash96Key_SSD
WHERE (L_ORDERKEY% 3) = 2

Query output

When you build this query into an SSIS package, you will notice that initially it does creates parallelism; the rows are quite nice divided and read from the two data sources, But unfortunately the total package execution time doesn’t come down, which is a bit strange.  (After all,  adding more sources is an optimization trick we discovered already long time ago, back in 2004 while testing an early Beta 2 release of SSIS).

Add the query to the OLE DB Source, specifying the correct stream with "=xx" which should always be less than the modulo factor. 

Specify in your package the multiple data sources and merge the result with the Union All component:

Merge the streams with the Union All component

The Activity monitor shows us quickly that indeed multiple SPIDS are fired by the SSIS package but many are Suspended and actually causing unnecessary synchronization:

SQL Activity Monitor shows CXPACKET waits

Luckily this can be resolved quickly by specifying the query hint “OPTION (MAXDOP 1)”. This removes the synchronization overhead without sacrificing on throughput in this case.

Specifiying the MAXDOP 1 hint

 

Reading multiple times from the same SQL table

Time to put it to the test and increase the number of Data source components in the package. Each OLE DB data source should point to the same input table,  just modify the modulo factor and the output separator when adding more. The best result I typically achieve while reading from the same data source 3 or 4 times in parallel:

 Package with 3 data source readers

When we check the perfmon counters one more time, you’ll see the IO throughput actually went up from initially 21. MB/sec to 46.6 MB/sec on average, so it has more than doubled!

The CPU consumption of both the SSIS and SQLServer process have also increased ;  SSIS from 100% to 350% (3.5 CPU) and SQLServer is using also an extra CPU.

Perfmon data while executing the package with 3 data readers 

Results

The overall package execution duration to read the same 180 million rows (12.89 GB)  decreased from initially 14 minutes 6 seconds to only 6 minutes and 33 seconds when reading the data with 3 parallel streams!

The test results 

 

Wrap Up

When you have to speed up reading large amounts of data from a single table in your SSIS package, you can decrease the duration by more than half by adding some form of parallelism into your package by using this  Modulo trick. Typically reading  data from the same SQL Server table with 3 or 4 parallel streams provides the best result! 

VN:F [1.9.17_1161]
Rating: 4.9/5 (18 votes cast)
VN:F [1.9.17_1161]
Rating: +15 (from 15 votes)
Better Tag Cloud