SQL under the hood (Part 1): Monitoring Current- & Pending Disk IOs

Do you know how many disk IO’s are being processed by your SQL Servers at this very moment? And how many of them are pending, waiting to be serviced? Which databases are the top IO consumers and which queries are actually creating the majority of the IO load? In this blog I will focus on the upper part of the IO chain and show you how to monitor the IOs from within SQL Server itself.

Pending Disk IOs

You probably have seen this message in a SQL Error log before: “SQL Server has encountered xxx occurrence(s) of I/O requests taking longer than 15 seconds to complete on file…”

This indicates you may have some (CPU drift going on, or indeed some-) serious Disk IO pressure; waiting 15 seconds or more for an queued IO to complete is a very long time… But what about all other IO’s that take somewhere between 0 and 14.99 seconds to complete?

 SQL ErrorLog entries

There’s a document published that reveals many tricks to look under the hood of SQLServer; Troubleshooting Performance Problems in SQL Server 2008 

I enhanced one of the queries a little bit, that actually lists the pending IOs from a SQL perspective in such a way that it will show on a per database- and physical file name basis how long IO’s are actually pending.

/* V1.0 –  18 dec. 2009 Henk van der Valk
Based on query from the Troubleshooting Perf. Probl.in SQL Server 2008:
http://download.microsoft.com/download/D/B/D/DBDE7972-1EB9-470A-BA18-58849DB3EB3B/TShootPerfProbs2008.docx
*/

SELECT db.name AS database_name
    , mf.name AS file_name
    , mf.physical_name
    , vfs.io_stall
    ,t2.io_pending_ms_ticks as IO_pending_millisec_ticks
    ,t2.scheduler_address
    ,sched.scheduler_id        
FROM  sys.dm_io_virtual_file_stats(NULL, NULL)vfs
    JOIN sys.dm_io_pending_io_requests as t2
        ON vfs.file_handle = t2.io_handle
    JOIN sys.databases db
        ON vfs.database_id = db.database_id
    JOIN sys.master_files mf
        ON vfs.database_id = mf.database_id
        AND vfs.file_id = mf.file_id
    JOIN sys.dm_os_schedulers sched
    on t2.scheduler_address = sched.scheduler_address
    
ORDER BY io_pending_ms_ticks DESC

SQL Pending IOs Query Output

The sample query output below shows that even a simple table scan operation creates queuing on the database file level of several hundred milliseconds.

Sample Pending Disk IO Query output  with some load on the spindles

This is very valuable information when you need to discuss IO response timings with the storage department since file access timings and queued IOs will cause very little confusion.

But there is more to check; to get a complete picture you should also know how many IOs are actually being processed by each of these database files. (Before thinking of enhancing the above query,) there’s an option built-in into SQL that addresses this graphically; the Activity Monitor.

SSMS SQL Activity Monitor

To find out quickly which databases are “Hot” and which are “Not”, you may have a look at the SSMS Activity Monitor. The Activity monitor will show you in by default after every 10 seconds, the real-time Read & Write Megabytes Data File I/O statistics for all the database- and log files. Besides the effective throughput, the Activity Monitor also reports the IO response time for every database- and log file. (if needed, you may change the refresh interval to 1 second; just right-click [Overview] and select [Refresh Interval]).

Changing the Activity monitor Refresh Interval

image

The lower part of the Activity monitor; allows you to also identify the source of the disk activity;  just have a look at the “Recent Expensive Queries”  part from the Activity monitor , or

select a SSMS build-in report like ‘Top queries by Total IO’.

image 

Physical Reads/sec :  Indeed , this value is direct related to the Windows Perfmon Logical and Physical Disk counter :  Disk Reads/Sec.
Logical Writes+Reads/sec :  multiplied by 8192 , this values represents the effective MBytes/sec  processed. 

You could also download queries like Adam Machanic’s   “who_is_active”  SP to have a  real-time overview of  physical Reads, Reads and Writes which will also pull the deltas on various metrics if you enable it by changing   the @DELTA_INTERVAL TINYINT from 0 into 1. (Thanks for pointing me to this one André). (Please note the SQL2008R2 Nov CTP will only return any IO values when MDOP is set to 1 on a multi core system,thanks for the tip, Thomas ;-))  

Windows Performance Monitor

With this information handed over by SQL we already have a pretty good picture of the Disk IO throughput and response timings. The next part of the IO chain we have to investigate is look at the IOs from the Windows Operation system perspective. Best place to start is to fire up the Windows Performance Monitor (perfmon) and select the “Logical disk” object counters like the “Average Read- and Write Queue length”. If indeed the disks are under severe load the disk queues on a per disk/LUN basis will also show hundreds of Pending IOs. To validate if the IO response timings are showing similar figures as SQL , select the “Avg. Disk Seconds/Read and – Write” counters. These values are measured in milliseconds. The pictures below show an total average disk Read Queue length of around 150 queued IOs with an average response time of about 1 millisecond each, which means pending IO values could be as high as 150 milliseconds on a SQL database file level.

Avg Disk Queue length

Milliseconds per Disk IO

Wrap up

The SQL DMV’s and the SSMS Activity Monitor allow you to monitor and quickly identify where the SQL IO workload is coming from, how long IOs are pending before being serviced and  which database & log files are actually “Hot”. In the part 2 I will show you how to snapshot the virtual database- and log file IO statistics and start building an accurate IO workload profile.

Henk posted at 2010-3-10 Category: Performance tip, SQL 2008 R2, Storage: SAN & SSD