Introducing Microsoft SQL Server 2016 R Services

With the release of CTP3 SQL Server 2016 and its native In-database support for the open source R language you can now call both R, RevoScaleR functions and scripts directly from within a SQL query and benefit from multi-threaded and multi-core in-DB computations. Yet another SQLServer milestone! The R integration brings a lot of benefits and new scenarios to your end users; it brings the utility of data science to your applications without the need to ‘export’ the data to your R environment!

Installing & Enabling SQLR integration

To get started you will need to install and configure SQL Server 2016 CTP3, Revolution R Open (RRO) and Revolution E Enterprise (RRE) on the server.

 

SQLServer 2016 CTP3 Setup   RRE

The versions I installed:
– SQL Server 2016 (CTP3.0) – 13.0.700.139 (X64)
– Revolution R Open, The Enhanced Open Source R distribution (R version 3.2.2 (2015-08-14) )
– Revolution R Enterprise for SQL Server 2016 Community Technology Preview (CTP) 3 – (7.5.0)

(To download the bits: https://msdn.microsoft.com/en-us/library/mt604847.aspx)

The R integration uses a new service called ‘SQL Server Launchpad’  to ‘launch’ Advanced Analytics Extensions processes; it enables the integration with Microsoft R Open using standard T-SQL statements. (Disabling this service will make Advanced Analytics features of SQL Server unavailable).

New SQL Server Launchpad service

To enable the R integration, you have to run the sp_configure ‘external scripts enabled’ command and grant permissions to users to execute R scripts via the new db_rrerole role:

sp_configure 'external scripts enabled', 1
reconfigure
GO

alter role db_rrerole add member HenkR;

 

Connecting to SQL Server from your favorite R environment

Traditionally you would connect from your favorite R IDE to SQL Server to retrieve data for analytics; or you can use the R GUI environment that is part of the installers bits. The default installation path to RGUI is: “C:\Program Files\RRO\RRO-3.2.2-for-RRE-7.5.0\R-3.2.2\bin\x64\Rgui.exe”. Also popular is the R Studio Desktop that is separate downloadable from: www.Rstudio.com. (the version I installed to get started is V0.99.486).

 

1) Connecting R to SQLServer

To connect your RStudio environment to SQL Server the traditional way to read data (and import it into memory or a local *.xdf file) would like something like this:  (it requires  to setup a proper connection string;

use the ‘hostname\\instancename’  to connect to a SQLServer named instance and pull some data from a table:)

sqlServerConnString <- “Driver=SQL Server;Server=SQL2016EE\\SQL2016CTP3;
Database=Oldskool_DB;Uid=sa;Pwd=MyPassword”

 

sqlServerDataDS <- RxSqlServerData(sqlQuery =

SELECT * FROM LINEITEM_Small”,connectionString = sqlServerConnString )

 

2)  Offloading  R scripts to SQLServer

 

Pulling a large dataset from a database and processing it locally would have required the writing into a local file which is a single threaded process. This can take a long time… So luckily with the CTP3 we can now bring the R script to the Data and process it there! To execute and R script directly from an SQL query use the new ‘sp_execute_external_script’ to, for example, calculate the mean of L_Quantity via R:

EXECUTE  sp_execute_external_script
                @language = N'R'
              , @script = N'OutputDataSet <- data.frame(mean(InputDataSet[,1]))'
              , @input_data_1 = N'SELECT  L_QUANTITY from LINEITEM_small as L_Quantity '
              WITH RESULT SETS ((col int not null));
go

SQL Server will execute this external script via its new launchpad service in the BxLserver.exe process (The Revolution Analytics Communication Component for SQL Server).

R Memory allocation

What would an article about ‘R’ be without a note about memory usage!
By default the SQLR integration service in the CTP3 build will use up to 20% of the total physical memory available on the host for all its In-DB running R sessions.  You can monitor the Memory usage of the BxLserver.exe process via the Taskmanager Working Set memory column.

Monitor the BxlServer process memory usage via Windows Task manager

If you cross the line your query will fail with a message similar to the one below:

Msg 39004, Level 16, State 20, Line 8

A ‘R’ script error occurred during execution of ‘sp_execute_external_script’ with HRESULT 0x80004004.

Msg 39019, Level 16, State 1, Line 8

An external script error occurred:

Error in sqlSatelliteCall() :

Calls: sqlSatelliteCall -> .Call

Execution halted

Invalid BXL stream

STDOUT message(s) from external script:

Failed to allocate or reallocate memory.

 

 

The resolution for running out of memory like this would be to increase the max. memory allocation which is a setting in the Rlauncher.config file (which is located in C:\Program Files\Microsoft SQL Server\MSSQL13… \MSSQL\Binn directory). Increase the default setting to a value that is still safe so it will not interfere with the sqlserver memory allocation.

The last line in the file ‘MEMORY_LIMIT_PERCENT=20’  controls the maximum percentage of physical memory which gets allocated to all the R sessions. Increase the value to allow the R service to consume more memory:

Rlauncher Config file

 

Wrap-Up

Embracing and operationalizing R functionality via the new SQLServer R Services integration has become very simple! It brings a lot of new possibilities to apply advanced analytics to your SQL data, ranging from data exploration to Predictive Modeling. Getting started with R might be a bit challenging but it will enrich your reports for sure!

Introducing SQL Data Warehouse – MPP in the cloud!

Yesterday SQL DW went live into public preview. With SQL DW we can now process a lot! of data at cloud scale by leveraging its Massive Parallel Processing (MPP) engine. Yet another milestone in SQL Server history! A lot of information has been published over the last couple of days including some introduction videos and a  ‘getting started’ by JRJ!

SQL DW videos

I have been part of an internal test team and have been working on a SQLDW instance that is located in the ‘SQL Japan West’ Datacenter a couple of thousands of miles from my hometown in the Netherlands; below some information & queries that might be handy for your future SQL DW toolbox!

Loading data into SQL DW

SQL DW is built upon Microsoft’s Analytics Platform System (APS) and its Massive parallel processing (MPP) capabilities. Many of the great options from APS, like defining a CCI index and table distribution key, as part of the CREATE TABLE statement are ready to go:

image

The recommended method to load data into SQL DW will very likely become AZCopy for Data Transfer into the cloud and PolyBase for the data load into the database. But to get started, you might stick to the BCP utility for the moment. As a quick test I uploaded a lineItem file to Japan with 600K rows and noticed an old school trick to increase the throughput also applies: increasing the network packet size from the default 4096 (4KB) to the maximum supported value of 16192 brings you an 10% improvement in load speed. 

Example of a BCP Command syntax:

bcp.exe COETSPDB.dbo.lineitem_cci in C:\Toolbox\dbgen\lineitem.tbl -c -U Henk
–S some_nr.database.windows.net –a 16192 –P ‘secret’ -q -t"|" 

Data upload throughput with various network packet sizes:

image

 

Monitoring active Queries

The preferred method to monitor the activity of your SQL DW will be via the Microsoft Azure Portal (See picture below). The very rich and enhanced query monitoring features are not yet publicly available so below a couple of queries from my APS toolbox to help you retrieve all the information you might be looking for!

MS Azure Portal

 

All recent Query requests

-- SQL DW - Query status like: Completed , Cancelled, Failed,Running, 
select * from sys.dm_pdw_exec_requests 
where status like 'Running'

As query output a list with all the active submitted queries (total_elapsed_time in milliseconds):

 

MPP Query Plan

To zoom into the steps details of a Query Massive Parallel Processing (MPP) execution plan,

you have to substitute the Query request ID (QID)  of interest to pull the details:

select * from sys.dm_pdw_request_steps  where request_id like 'QID552835'
ORDER BY Step_index ASC

 

sys.dm_pdw_request_steps 

 

Retrieve the entire Query history from an SQL DW database

-- Query history : List all queries executed on SQL DW: 

select q.[request_id], q.[status], q.[submit_time] as start_time, q.[end_time], q.[total_elapsed_time], q.[command], q.[error_id], q.[session_id], s.[login_name], q.[label] 
from sys.dm_pdw_exec_requests q inner join sys.dm_pdw_exec_sessions s on s.[session_id] = q.[session_id] 
where LEFT(s.client_id, 9) <> '127.0.0.1' 
and [start_time] > '06/25/2015 08:00'
order by [start_time] desc OPTION (label = 'Rd_Query_history')

Sample Output: an overview of all the queries executed against a SQL DW/APS database:

image

 

SQL DW supports 2 tables formats: Hash distributed and Round_Robin.

To view the tables names and their corresponding distribution type  run:

SELECT a.name AS [Table Name], 
b.distribution_policy_desc AS [Distribution Type]
FROM sys.tables AS a 
JOIN sys.pdw_table_distribution_properties AS b 
    ON a.object_id = b.object_id

Output:

image

 

Distribution Column overview

 

To list the distribution column and data type of all HASH distributed tables:

SELECT a.name AS [Table Name], c.name AS [Distribution Column], 
b.name AS [Data Type]
FROM sys.pdw_column_distribution_properties AS d
INNER JOIN sys.columns AS c 
    ON c.object_id = d.object_id
INNER JOIN sys.tables AS a 
    ON a.object_id = d.object_id
LEFT OUTER JOIN sys.types AS b 
    ON c.user_type_id = b.user_type_id
WHERE d.distribution_ordinal = 1 AND c.column_id = d.column_id

Sample query output:

 

Wrap-Up

SQL DW is a great and very welcome addition to the SQL Server family! Its designed with ease of use in mind and can handle petabytes of data for you by leveraging the Massive parallel processing (MPP) capabilities as introduced with APS. In the text above you have seen a couple of queries and their output to help you to get familiarized with the concept of the MPP query execution behavior of SQL DW.

…Just imagine for a second what new insights SQL DW with its massive cloud scale can & will bring!