Introducing SQL Data Warehouse – MPP in the cloud!
Yesterday SQL DW went live into public preview. With SQL DW we can now process a lot! of data at cloud scale by leveraging its Massive Parallel Processing (MPP) engine. Yet another milestone in SQL Server history! A lot of information has been published over the last couple of days including some introduction videos and a ‘getting started’ by JRJ!
I have been part of an internal test team and have been working on a SQLDW instance that is located in the ‘SQL Japan West’ Datacenter a couple of thousands of miles from my hometown in the Netherlands; below some information & queries that might be handy for your future SQL DW toolbox!
Loading data into SQL DW
SQL DW is built upon Microsoft’s Analytics Platform System (APS) and its Massive parallel processing (MPP) capabilities. Many of the great options from APS, like defining a CCI index and table distribution key, as part of the CREATE TABLE statement are ready to go:
The recommended method to load data into SQL DW will very likely become AZCopy for Data Transfer into the cloud and PolyBase for the data load into the database. But to get started, you might stick to the BCP utility for the moment. As a quick test I uploaded a lineItem file to Japan with 600K rows and noticed an old school trick to increase the throughput also applies: increasing the network packet size from the default 4096 (4KB) to the maximum supported value of 16192 brings you an 10% improvement in load speed.
Example of a BCP Command syntax:
bcp.exe COETSPDB.dbo.lineitem_cci in C:\Toolbox\dbgen\lineitem.tbl -c -U Henk
–S some_nr.database.windows.net –a 16192 –P ‘secret’ -q -t"|"
Data upload throughput with various network packet sizes:
Monitoring active Queries
The preferred method to monitor the activity of your SQL DW will be via the Microsoft Azure Portal (See picture below). The very rich and enhanced query monitoring features are not yet publicly available so below a couple of queries from my APS toolbox to help you retrieve all the information you might be looking for!
All recent Query requests
-- SQL DW - Query status like: Completed , Cancelled, Failed,Running, select * from sys.dm_pdw_exec_requests
where status like 'Running'
As query output a list with all the active submitted queries (total_elapsed_time in milliseconds):
MPP Query Plan
To zoom into the steps details of a Query Massive Parallel Processing (MPP) execution plan,
you have to substitute the Query request ID (QID) of interest to pull the details:
select * from sys.dm_pdw_request_steps where request_id like 'QID552835'
ORDER BY Step_index ASC
Retrieve the entire Query history from an SQL DW database
-- Query history : List all queries executed on SQL DW: select q.[request_id], q.[status], q.[submit_time] as start_time, q.[end_time], q.[total_elapsed_time], q.[command], q.[error_id], q.[session_id], s.[login_name], q.[label] from sys.dm_pdw_exec_requests q inner join sys.dm_pdw_exec_sessions s on s.[session_id] = q.[session_id] where LEFT(s.client_id, 9) <> '127.0.0.1' and [start_time] > '06/25/2015 08:00' order by [start_time] desc OPTION (label = 'Rd_Query_history')
Sample Output: an overview of all the queries executed against a SQL DW/APS database:
SQL DW supports 2 tables formats: Hash distributed and Round_Robin.
To view the tables names and their corresponding distribution type run:
SELECT a.name AS [Table Name], b.distribution_policy_desc AS [Distribution Type] FROM sys.tables AS a JOIN sys.pdw_table_distribution_properties AS b ON a.object_id = b.object_id
Output:
Distribution Column overview
To list the distribution column and data type of all HASH distributed tables:
SELECT a.name AS [Table Name], c.name AS [Distribution Column], b.name AS [Data Type] FROM sys.pdw_column_distribution_properties AS d INNER JOIN sys.columns AS c ON c.object_id = d.object_id INNER JOIN sys.tables AS a ON a.object_id = d.object_id LEFT OUTER JOIN sys.types AS b ON c.user_type_id = b.user_type_id WHERE d.distribution_ordinal = 1 AND c.column_id = d.column_id
Sample query output:
Wrap-Up
SQL DW is a great and very welcome addition to the SQL Server family! Its designed with ease of use in mind and can handle petabytes of data for you by leveraging the Massive parallel processing (MPP) capabilities as introduced with APS. In the text above you have seen a couple of queries and their output to help you to get familiarized with the concept of the MPP query execution behavior of SQL DW.
…Just imagine for a second what new insights SQL DW with its massive cloud scale can & will bring!
4 Responses Leave a comment
This is so exciting !
Thanks for all the amazing APS posts. They are very scarce to find.
Is there a direct correlation between DWU and Distributions in an on premise appliance ?
Hi Alain, thanks!
In the current SQL Data Warehouse public preview there are 60 distributions assigned to each SQLDW instance.
When scaling up the horsepower by increasing the number of DWU’s, more CPU’s and Memory is assigned to the SQL grid to provide more in-memory cache and also more bandwidth for Polybase when importing/exporting data!
Hi
How can I increase the 10K limit of sys.dm_pdw_exec_requests
Hi Nidhi, at the moment you can’t; (please provide the product team direct feedback why you would need more?)