Introducing SQL Data Warehouse – MPP in the cloud!

Yesterday SQL DW went live into public preview. With SQL DW we can now process a lot! of data at cloud scale by leveraging its Massive Parallel Processing (MPP) engine. Yet another milestone in SQL Server history! A lot of information has been published over the last couple of days including some introduction videos and a  ‘getting started’ by JRJ!

SQL DW videos

I have been part of an internal test team and have been working on a SQLDW instance that is located in the ‘SQL Japan West’ Datacenter a couple of thousands of miles from my hometown in the Netherlands; below some information & queries that might be handy for your future SQL DW toolbox!

Loading data into SQL DW

SQL DW is built upon Microsoft’s Analytics Platform System (APS) and its Massive parallel processing (MPP) capabilities. Many of the great options from APS, like defining a CCI index and table distribution key, as part of the CREATE TABLE statement are ready to go:

image

The recommended method to load data into SQL DW will very likely become AZCopy for Data Transfer into the cloud and PolyBase for the data load into the database. But to get started, you might stick to the BCP utility for the moment. As a quick test I uploaded a lineItem file to Japan with 600K rows and noticed an old school trick to increase the throughput also applies: increasing the network packet size from the default 4096 (4KB) to the maximum supported value of 16192 brings you an 10% improvement in load speed. 

Example of a BCP Command syntax:

bcp.exe COETSPDB.dbo.lineitem_cci in C:\Toolbox\dbgen\lineitem.tbl -c -U Henk
–S some_nr.database.windows.net –a 16192 –P ‘secret’ -q -t"|" 

Data upload throughput with various network packet sizes:

image

 

Monitoring active Queries

The preferred method to monitor the activity of your SQL DW will be via the Microsoft Azure Portal (See picture below). The very rich and enhanced query monitoring features are not yet publicly available so below a couple of queries from my APS toolbox to help you retrieve all the information you might be looking for!

MS Azure Portal

 

All recent Query requests

-- SQL DW - Query status like: Completed , Cancelled, Failed,Running, 
select * from sys.dm_pdw_exec_requests 
where status like 'Running'

As query output a list with all the active submitted queries (total_elapsed_time in milliseconds):

 

MPP Query Plan

To zoom into the steps details of a Query Massive Parallel Processing (MPP) execution plan,

you have to substitute the Query request ID (QID)  of interest to pull the details:

select * from sys.dm_pdw_request_steps  where request_id like 'QID552835'
ORDER BY Step_index ASC

 

sys.dm_pdw_request_steps 

 

Retrieve the entire Query history from an SQL DW database

-- Query history : List all queries executed on SQL DW: 

select q.[request_id], q.[status], q.[submit_time] as start_time, q.[end_time], q.[total_elapsed_time], q.[command], q.[error_id], q.[session_id], s.[login_name], q.[label] 
from sys.dm_pdw_exec_requests q inner join sys.dm_pdw_exec_sessions s on s.[session_id] = q.[session_id] 
where LEFT(s.client_id, 9) <> '127.0.0.1' 
and [start_time] > '06/25/2015 08:00'
order by [start_time] desc OPTION (label = 'Rd_Query_history')

Sample Output: an overview of all the queries executed against a SQL DW/APS database:

image

 

SQL DW supports 2 tables formats: Hash distributed and Round_Robin.

To view the tables names and their corresponding distribution type  run:

SELECT a.name AS [Table Name], 
b.distribution_policy_desc AS [Distribution Type]
FROM sys.tables AS a 
JOIN sys.pdw_table_distribution_properties AS b 
    ON a.object_id = b.object_id

Output:

image

 

Distribution Column overview

 

To list the distribution column and data type of all HASH distributed tables:

SELECT a.name AS [Table Name], c.name AS [Distribution Column], 
b.name AS [Data Type]
FROM sys.pdw_column_distribution_properties AS d
INNER JOIN sys.columns AS c 
    ON c.object_id = d.object_id
INNER JOIN sys.tables AS a 
    ON a.object_id = d.object_id
LEFT OUTER JOIN sys.types AS b 
    ON c.user_type_id = b.user_type_id
WHERE d.distribution_ordinal = 1 AND c.column_id = d.column_id

Sample query output:

 

Wrap-Up

SQL DW is a great and very welcome addition to the SQL Server family! Its designed with ease of use in mind and can handle petabytes of data for you by leveraging the Massive parallel processing (MPP) capabilities as introduced with APS. In the text above you have seen a couple of queries and their output to help you to get familiarized with the concept of the MPP query execution behavior of SQL DW.

…Just imagine for a second what new insights SQL DW with its massive cloud scale can & will bring!

Henk posted at 2015-6-25 Category: Azure, SQL DW | Tags:

4 Responses Leave a comment

  1. #1Alain Dormehl @ 2015-7-2 17:06

    This is so exciting !

    Thanks for all the amazing APS posts. They are very scarce to find.

    Is there a direct correlation between DWU and Distributions in an on premise appliance ?

  2. Henk @ 2015-7-3 09:29

    Hi Alain, thanks!
    In the current SQL Data Warehouse public preview there are 60 distributions assigned to each SQLDW instance.
    When scaling up the horsepower by increasing the number of DWU’s, more CPU’s and Memory is assigned to the SQL grid to provide more in-memory cache and also more bandwidth for Polybase when importing/exporting data!

  3. #2nidhi @ 2015-11-12 17:04

    Hi
    How can I increase the 10K limit of sys.dm_pdw_exec_requests

  4. Henk @ 2015-11-16 11:54

    Hi Nidhi, at the moment you can’t; (please provide the product team direct feedback why you would need more?)

Leave a Reply

You must be logged in to post a comment.