How to increase SQL Database Full Backup speed using compression and Solid State Disks

The SQL 2008 Database backup compression feature (which was introduced as a SQL2008 Enterprise Edition only feature) will become available in the SQL2008 R2 Standard Edition also ( SQL 2008 R2 Editions).  I’m using the compression feature for quite some time now and really love it; it saves significant amounts of disk space and increases the backup throughput. And when such a great feature is available I would recommend you to use it!  In the following walk through I will show you some simple tricks to speed up the backup throughput.

How fast can we run a Full Backup on a single database with a couple of billion rows of data, occupying a couple hundred Gigabytes of disk space, spread across multiple Filegroups? 

Of course your server will use some extra CPU cycles to compress the data on the fly and will use the maximum available IO bandwidth. For that purpose I tested on some serious hardware with plenty of both: a 96 core Unisys ES7000 model 7600R with 2  DSI Solid State Disk units who deliver a total of 6+ GB/sec IO throughput.  

SQL Database Server - Front side    Rear: DSI3400, DSI3500 hooked up to the digital Highway

Getting started with Database Backup Compression

Database backup compression is disabled by default and it can be enabled through the SSMS GUI or by adding the word “COMPRESSION” to a T-SQL backup query.

Through SSMS, when you -right click- the database which you would like to backup, under –Tasks- , -Back Up…-, -Options-, at the bottom , you will find the Compression feature. (see pictures)

After you have selected  the “Compress Backup” option, click the –Script- Option to generate the TSQL statement. Please note that the word  COMPRESSION is all you need to enable the feature from your backup query statement.

Starting an Backup through SSMS

Option to select Backup Compression

To enable backup compression feature for all your databases as the default option, change the default with the following sp_configure command:

USE master
GO
EXEC sp_configure ‘backup compression default’, '1';
RECONFIGURE WITH OVERRIDE

 

Step1 : Measure the “Out of the box” throughput

By running the above query,  as a result, more than 1400 MByte/sec is read on average from both the DSI Solid State disks. That’s like reading all data from 2 full cd-rom’s each second. Not bad!

Out of the box Backup with Compression throughput  

Waitstats

The SQL Waitstats show that the number 1 wait_type is BACKUPBUFFER.

The MSDN  explanation for the BACKUPBUFFER wait_type is interesting: “the backup task is waiting for data or is waiting for a buffer in which to store data. This type is not typical, except when a task is waiting for a tape mount”.    ehhh  ok… since we are not using tapes , it means that this is not typical !   Let’s see what we can find out about the buffers.

sys.dm_os_latch_stats order by wait_time_ms DESC 

Traceflag

To get more insights in the backup settings there are 2 interesting traceflags. With these traceflags enabled the actual backup parameters are logged into the SQL “Errorlog “.

DBCC TRACEON (3605, 1)
DBCC TRACEON (3213, 1)

The Errorlog show that our backup was using 9 Buffers and allocated 9 Megabyte of buffer space.

9 MByte seems a relative low value to me to queue up all the destination backup file data and smells like an area where we can get some improvement, (especially since we have we have 16 x 4Gbit fiber cables and 6+ GigaByte/sec of IO bandwidth waiting to get saturated ;-) ).

DBCC TRACEON (3605, -1) and DBCC TRACEON (3213, -1)

 

Step2: Increase throughput by adding more destination files

A feature that not many people know is the option to specify multiple file destinations to increase the throughput:

Specifying multiple backup destination files

To add multiple Backup destination files in the query,  add them like this:

— 2) Adding more Backup Destination files:

DBCC TRACEON (3605, 1)
DBCC TRACEON (3213, 1)

BACKUP DATABASE [TPCH_1TB]
TO
  DISK = N'C:\DSI3400\LUN00\backup\TPCH_1TB-Full',
  DISK = N'C:\DSI3500\LUN00\backup\File2',
  DISK = N'C:\DSI3500\LUN00\backup\File3',
  DISK = N'C:\DSI3500\LUN00\backup\File4',
  DISK = N'C:\DSI3500\LUN00\backup\File5',
  DISK = N'C:\DSI3400\LUN00\backup\File6',
  DISK = N'C:\DSI3500\LUN00\backup\File7',
  DISK = N'C:\DSI3500\LUN00\backup\File8',
  DISK = N'C:\DSI3500\LUN00\backup\File9'

  WITH NOFORMAT, INIT,NAME = N'TPCH_1TB-Full Database Backup',
  SKIP, NOREWIND, NOUNLOAD, COMPRESSION, STATS = 10

DBCC TRACEOFF(3605, 1)
DBCC TRACEOFF(3213, 1)

 
Result of adding 8 more destination files:  the throughput increases from 1.4 GB/sec  up to 2.2-  2.4 GB/sec.

That’s a quick win to remember!

Adding extra backup files has doubled the throughput

The SQL Errrorlog with the buffer configuration parameters shows that extra buffers are added automatically.

SQL Errorlog with parameters

 

Step3 : Set Backup parameters options

The Backup/Restore buffer configuration parameters show some interesting parameters  and values!

Time to start reading the ehh documentation!  When you highlight the word ‘Backup’ and you hit <Shift-F1>  you can read more on the topic; there are  2 Data Transfer Options listed :
BUFFERCOUNT and MAXTRANSFERSIZE.

BUFFERCOUNT :  specifies the total number of I/O buffers to be used for the backup operation.  
The total space that will be used by the buffers is determined by: buffercount * maxtransfersize.
The output shows this is correct;  with 49 buffers * 1024 KB = 49 MB total buffer space is in use.

MAXTRANSFERSIZE : specifies the largest unit of transfer in bytes to be used between SQL Server and the backup media.
The possible values are multiples of 64 KB ranging up to 4194304 bytes (4 MB).  The default is 1 MB.

a 3rd option is listed under the section Media Set Options:

BLOCKSIZE : specifies the physical block size, in bytes. The supported sizes are 512, 1024, 2048, 4096, 8192, 16384, 32768, and 65536 (64 KB) bytes.
The default is 65536 for tape devices and 512 otherwise.  
 

Trial and Measure approach

By using the famous “Trial and Measure”  approach, I found that the selecting the maximum value for Blocksize (65535) and doubling the MAXTRANSFERSIZE to (2097152) works best.  also a BUFFERCOUNT of 2200 , allocating a buffer of 4400 MB total works best to get the maximum throughput;

  — 3) Optimizing throughput with parameters:
DBCC TRACEON (3605, 1)
DBCC TRACEON (3213, 1)

BACKUP DATABASE [TPCH_1TB]TO
  DISK = N'C:\DSI3400\LUN00\backup\TPCH_1TB-Full',
  DISK = N'C:\DSI3500\LUN00\backup\File2',
  DISK = N'C:\DSI3500\LUN00\backup\File3',
  DISK = N'C:\DSI3500\LUN00\backup\File4',
  DISK = N'C:\DSI3500\LUN00\backup\File5',
  DISK = N'C:\DSI3400\LUN00\backup\File6',
  DISK = N'C:\DSI3500\LUN00\backup\File7',
  DISK = N'C:\DSI3500\LUN00\backup\File8',
  DISK = N'C:\DSI3500\LUN00\backup\File9'

  WITH NOFORMAT, INIT,NAME = N'TPCH_1TB-Full Database Backup',
  SKIP, NOREWIND, NOUNLOAD, COMPRESSION,STATS = 10

— Magic:
,BUFFERCOUNT = 2200
,BLOCKSIZE = 65536
,MAXTRANSFERSIZE=2097152
GO

DBCC TRACEOFF(3605, 1)
DBCC TRACEOFF(3213, 1)

Result: the average throughput is more then doubled, and the maximum peak throughput is up to 3939 MB/sec !

Fully optimized: 2.5x higher throughput Backup

 BLOCKSIZE 

An interesting observation is that I was more less expecting to see that by changing the BLOCKSIZE  would also show up in the log,  but it doesn’t .

Also the “Filesystem I/O alignment” value remained the same:

SQL Errorlog with backup settings

By coincidence, (I forgot to delete the backup files when adding some more destination files,) I found out that the blocksize value does make a change by looking at the error message:

Msg 3268, Level 16, State 1, Line 4
Cannot use the backup file 'C:\DSI3500\LUN00\dummy-FULL_backup_TPCH_1TB_2' because
it was originally formatted with sector size 65536
and is now on a device with sector size 512.
Msg 3013, Level 16, State 1, Line 4
BACKUP DATABASE is terminating abnormally.

By specifying a large 64KB sector size instead of the default 512 bytes typically shows an 5-6%  improvement in backup throughput.

 

BACKUP to DISK = ‘NUL’

To estimate and check how fast you can read the data from a database or Filegroup  there is a special option you can use to backup to:   DISK = ‘NUL’.  You only need 1 of those !

BACKUP DATABASE [TPCH_1TB]
FILEGROUP = 'SSD3500_0',
FILEGROUP = 'SSD3500_1'
TO DISK = 'NUL'
WITH COMPRESSION
, NOFORMAT, INIT,  SKIP, NOREWIND, NOUNLOAD
, BUFFERCOUNT = 2200
, BLOCKSIZE = 65536
, MAXTRANSFERSIZE=2097152

 

Wrap-Up

Backup compression is a great feature to use. It will save you disk capacity and reduce the time needed to backup your data. SQL can leverage the IO bandwidth of Solid State storage well but to achieve maximum throughput you need to do some tuning. By adding multiple backup destination files and specifying the BUFFERCOUNT, BLOCKSIZE and MAXTRANSFERSIZE parameters you can typically double the backup throughput, which means reducing the backup time-taken by half!

Optimizing SQL In Memory Table Scan processing speed

How fast can you can actually read data from a table that is already loaded in memory? 

People often assume that, once it’s in memory, it’s as fast as it can get and won’t get any faster ? But how fast is it really ? 

Let’s find out!

The Math Method

If the total table data size is known by looking at (sp_spaceused LINEITEM_SSD)  and the total execution time we can find quickly by enabling the SSMS Client Statistics.

sp_spaceused

Client Statistics

Then it’s just a matter of doing the math’s and divide 44491064 KB by 7607 milliseconds = 5848.7 Megabyte/sec to scan through a single table in memory.

Also the average rows read per second can be calculated  by dividing 400 million rows by 7607 milliseconds = 52590 rows read/second. 

Another option is to capture the stats with SQL Profiler and divide 5565037 Reads by 6601 milliseconds =  832.96 reads/sec * 8192 bytes == about   6.5 GB /sec .

SQL Profiler

These 2 examples already show an serious deviation in the average achieved throughput scanning the same data from memory. 

So what other options do we have to  determine the effective throughput figures , preferably realtime ?

Page lookups/sec

The best metric I typically use to check how fast you can scan data in the SQL Buffer Pool is to start the Windows Performance Monitor  (Perfmon.exe) 

and look at the “Page lookups/sec” counter which is listed under the SQL Server Buffer manager object:

Windows Perfmon counter selection

It displays the “Number of requests to find a page in the buffer pool”, but I noticed that is also a representative counter  to check how much data SQL is processing under the hood.

Each Page is 8 Kilobyte in size,  so by multiplying the “page lookups/sec” number by 8192 you get an idea of how many bytes are touched each second.

Jeroen Kooiman a  great programmer and colleague of mine wrote a nice graphical utility that allows me to show this type of derived values in real time.

 

Test 4a)  Select COUNT (*) from LINEITEM_SSD

An easy way to force a full tablescan is to execute the command  “Select COUNT (*) from <table>”. 

When we execute this command on our LineItem test table of 43 GByte with 400 Million rows  and read it from Solid State storage it takes 38 seconds.

But once it is in memory, the Page lookup counter shows values between 770000 and 800000 page lookups/sec. It peaks at 6.1 GB/sec and SQLServer uses 1600% Processor time, or a total of 16 Cores while executing and the query completes in 7.7 seconds, almost 5 times faster then reading from Solid State.

Client  Statistics

 Table Scan Speed of a Single table

So we can scan a regular table which is in memory with over 6 GB/sec. In my previous post on optimizing table scan speed I showed that partitioning a table brings a significant throughput gain and compressing the table even more. Let’s find out if partitioning and compressing the pages has impact on the average scan speed in memory too!

Test 4b)  Select COUNT (*) from  a Partitioned table

We run this test on a 96 core Unisys ES7000 system with 512 GB of RAM. Plenty of resources and memory bandwidth available.

With the table partitioned in 96 Partitions,  the tablescan speed increases with the SQL2008 R2 Nov. CTP to 27.3 GB/sec.

and also almost 61 cores are used.   Scanning a partitioned table in memory is more than 4 times faster than an unpartitioned table and uses almost 4x more CPU resources.

Partitioned Table Scanspeed

SQLProfiler shows it takes only 2.2 seconds to read 400 million rows…

SQL Profiler

 

Test 4c)  SELECT count (*) from LINEITEM_Hash96Key_SSD

option (MAXDOP 96)

During the previous test we where getting close to the default Maxdop of 64 cores;

and not yet utilizing all 96 cores.  By specifying the query hint “option (MAXDOP 96) “  we can check if SQL will use more;

SELECT count (*) from LINEITEM_Hash96Key_SSD option (MAXDOP 96)

With the hint to use a maximum of 96 cores,  the tablescan of 400 million rows / 46 GByte completes within 2 seconds !

SELECT count (*) from LINEITEM_Hash96Key_SSD option (MAXDOP 96)

Test 4d) Table scanning a Partioned and Compressed  table that is stored as a Clustered Index

We ended the blog about tablescan optimization with the conclusion that a partitioned and compressed table provides the best throughput when reading from disk.

Will the this also be true for the data once it resides in memory ?

(Command to compress the pages:  “ALTER TABLE [dbo].LINEITEM_hash96key_SSD_Compressed REBUILD WITH (DATA_COMPRESSION = PAGE “)

To determine the compression state of partitions in a partitioned table, query the the sys.partitions catalog view and check the data_compression column:

Select * from sys.partitions where ..

Select * from sys.partitions where .. 

When we check  the table with sp_spaceused LINEITEM_hash96key_SSD_Compressed

we see that the data size has been reduced significant because of the compression: from 44491064 KB  it’s down to 19328312 KB, which is 56% less.

Compressed

Executing the query  SELECT count (*) from LINEITEM_Hash96Key_SSD_Compressed shows that the Page lookups/sec increases from 800K –

to over 1.8 million Page lookups/sec.  The graph shows that the 400 million rows are scanned with 14 GB/sec, utilizing 44.5 Cores.

But what’s even more important,  the query completes within only 1.3  or 1.4 seconds!

In memory Table Scan speed of a partitioned and compressed table

 

Compressed table

Wrap up

By looking at the Page Lookups/sec perfmon counter,  you can get a pretty good picture of how much data SQL Server is processing. 

The examples show that the overall memory scan speed increases significant (over 4 times)  when you partition a table and the execution duration drops also significant.

By compressing the partitioned table , the tablescan is even faster by utilizing less CPU resources and Memory space and most important, the overall duration of scanning 400 Million rows which are stored in memory drops from 7.6 seconds to 1.4 seconds!