Creating R Stored Procedures in SQL Server 2016 Using sqlrutils


In the Microsoft R Server forum the other day was a question about inputs to SQL Server R stored procedures, or rather about "strange" parameter naming, when using the sqlrutils package. I got intrigued by the question and started playing around with researching it, and this blog-post is the result. Oh, and this blog-post would never have seen the light of the day, if it hadn't been for the original poster of the question: JD Long, thanks JD!

So, what is SQL Server R stored procedures and what is sqlrutils?

Before we answer the question above, let's do a quick recap of sp_execute_external_script.

sp_execute_external_script

The procedure sp_execute_external_script is an extended stored procedure, and it allows us to execute R scripts (and in SQL 2017 also Python) from inside SQL Server. You use it by passing it the script you want to execute as well as various parameters. Let's assume you have some R code looking like so:

Iris Dataset
# set a variable
multiplier <- 5
#get the iris data set as a data frame
iris_dataset <- iris
# grab the setosa species
setosa <- iris[iris$Species == 'setosa',]
# calculate mean of the Sepal.Width for setosa
menSepWidth <- mean(setosa$Sepal.Width)
# use the multiplier to do some "stuff
iris_dataset$Sepal.Length <- iris_dataset$Sepal.Length * multiplier
# look at the resulting dataset
View(iris_dataset$Sepal.Length)
# print out the mean
print(menSepWidth)

Code Snippet 1: R Script

As you see in Code Snippet 1, we:

Yes, yes, I know - the code is very, very simplistic, but it will still get the points across.

NOTE: If you haven't heard about the iris dataset before, you can read more about it here.

To use this script in SQL Server with sp_execute_external_script, you would write something like this:

R Script in SQL
DECLARE @out_val float;
exec sp_execute_external_script
@language = N'R',
@script = N'
iris_dataset <- iris
setosa <- iris[iris$Species == "setosa",]
menSepWidth <- mean(setosa$Sepal.Width)
iris_dataset$Sepal.Length <- iris_dataset$Sepal.Length * multiplier
OutputDataSet <- data.frame(iris_dataset$Sepal.Length)
',
@params = N'@multiplier float, @menSepWidth float OUTPUT',
@multiplier = 5,
@menSepWidth = @out_val OUTPUT
WITH RESULT SETS ((SepalLength float));
SELECT @out_val AS MeanSepWidth

Code Snippet 2: The iris Script in SQL Server

As you see from Code Snippet 2, the R script is passed in to the @script parameter of sp_execute_external_script. The actual script looks almost identical to what is in Code Snippet 1, except for instead of the View(iris_dataset$Sepal.Length), we use OutputDataSet. The OutputDataSet variable is a well known variable in SQL Server R Services, and it is the default variable that will contain the data to be returned to SQL Server as a result-set upon completion of the stored procedure call. The layout of the result-set is defined in the WITH RESULT SETS statement, and in Code Snippet 2 above we are interested in one column, of type float and we name it SepalLength.

The parameters we use are defined in the @params parameter, followed by the parameters themselves. In this case @multiplier which we set to 5, and @menSepWidth which is OUTPUT. If you have used sp_executesql the parameter syntax should be familiar to you. We assign the (at the top of the script) declared @out_val variable to the @menSepWidth output parameter, and at the end of the script we SELECT it out.

When running Code Snippet 2, the output will be like in the two figures below:

Figure 1: Result-set

Figure 1 shows the output from OutputDataSet, the calculated Sepal.Length column.

Figure 2: Output Parameter

In Figure 2 we see the output parameter; @menSepWidth.

This is nice, being able to execute sp_execute_external_script, and run our R scripts in SQL Server. However how do I operationalize this, I can't really be running these scripts every time I want to "crunch" some data?! Well, the answer to that is to wrap the code we see in Code Snippet 2 in a stored procedure, which would take the necessary parameters, and then execute sp_execute_external_script inside the procedure.

To make it easier to create wrapping procedures, Microsoft has an R package to help with this: sqlrutils.

sqlrutils

The sqlrutils package allows R users to, from inside an R IDE of choice, put their R scripts into a T-SQL stored procedure, register that stored procedure with a database, and run the stored procedure from an R development environment.

Let us see what the package does. To do that run from inside an R IDE, I am using R Tools for Visual Studio 2017 (RTVS), help(package="sqlrutils").

NOTE: Before you run the help statement above, you may have to run: library(sqlrutils), to load the package.

When running `help(package="sqlrutils"), the httpd help server starts and you should see something like so:

Figure 3: Output from Help

In Figure 3 you see the various function the package exposes. Now, let's see what we can do with it.

R Function

It is considered best practices to rewrite your R script as a function, and pass necessary parameters to the function. So we take our script and turn it into a function:

Function
irisFunc <- function(multiplier) {
iris_dataset <- iris
setosa <- iris[iris$Species == 'setosa',]
menSepWidth <- mean(setosa$Sepal.Width)
iris_dataset$Sepal.Length <- iris_dataset$Sepal.Length * multiplier
sepLength <- data.frame(iris_dataset$Sepal.Length)
retList <- list(sepLengthData = sepLength,
menSepWidthValue = menSepWidth)
return(retList)
}

Code Snippet 3: Create Function

In Code Snippet 3 we create the function and we:

The function will eventually be part of the script inside sp_execute_external_script, but before we go there we should define the required parameters for the procedure.

Define Parameters

In this example we have three parameters / objects:

When looking at the output from the help as in Figure 3, we see some functions that looks like doing what we want to do:

So to create our parameters we write some code like in Code Snippet 4:

Parameters
# if the sqlrutils package is not loaded yet:
# library("sqlrutils")
inParam <- InputParameter("multiplier", "numeric")
outParam <- OutputParameter("menSepWidthValue", "numeric")
outputData <- OutputData("sepLengthData")

Code Snippet 4: Parameter Creation

Notice how the output dataset, and the output parameter are named as the names in the list. This is important as we will see in a little bit. You can have multiple parameters (input/output), but only one output data set. If you have multiple parameters you define them as in Code Snippet 4.

Now when the parameters are defined, we can create the stored procedure.

Procedure Creation

To create a stored procedure we use the StoredProcedure class generator, and the signature of StoredProcedure looks like so:

Signature
StoredProcedure(func,
spName,
...,
filePath = NULL,
dbName = NULL,
connectionString = NULL,
batchSeparator = "GO")

Code Snippet 5: Signature for StoredProcedure Class Generator

The arguments in the signature are:

So, let's see what we should do to create a stored procedure based on what we have done so far (the code snippet below includes the parameter definitions from *Code Snippet 4):

Stored Procedure
inParam <- InputParameter("multiplier", "numeric")
outParam <- OutputParameter("menSepWidthValue", "numeric")
outputData <- OutputData("sepLengthData")
irisProc <- StoredProcedure(irisFunc,
"pr_IrisProc",
inParam,
outputData,
outParam,
filePath = "C:\Temp"
)

Code Snippet 6: Generate Stored Procedure from R Function

In Code Snippet 6 we:

We skip the rest of the arguments, and when we run the code in Code Snippet 6, the source file looks like so:

Generated Procedure
IF (OBJECT_ID('pr_IrisProc') IS NOT NULL)
DROP PROCEDURE pr_IrisProc
GO
CREATE PROCEDURE pr_IrisProc
@parallel_outer bit = 0,
@multiplier_outer float,
@menSepWidthValue_outer float output
AS
BEGIN TRY
exec sp_execute_external_script
@language = N'R',
@script = N'
irisFunc <- function (multiplier)
{
iris_dataset <- iris
setosa <- iris[iris$Species == "setosa", ]
menSepWidth <- mean(setosa$Sepal.Width)
iris_dataset$Sepal.Length <- iris_dataset$Sepal.Length *
multiplier
sepLength <- data.frame(iris_dataset$Sepal.Length)
retList <- list(sepLengthData = sepLength, menSepWidthValue = menSepWidth)
return(retList)
}
result <- irisFunc(multiplier = multiplier)
if (is.list(result)) {
OutputDataSet <- result$sepLengthData
menSepWidthValue <- result$menSepWidthValue
} else stop("the R function must return a list")
',
@parallel = @parallel_outer,
@params = N'@multiplier float, @menSepWidthValue float output',
@multiplier = @multiplier_outer,
@menSepWidthValue = @menSepWidthValue_outer output
END TRY
BEGIN CATCH
THROW;
END CATCH;
GO

Code Snippet 7: Generated Stored Procedure

That looks OK, a couple of things to point out:

You can now take the source file and deploy the procedure to a database of your choice. Then you execute the procedure as so:

Execute Proc
DECLARE @out float;
EXEC pr_IrisProc @parallel_outer = 0,
@multiplier_outer = 5,
@menSepWidthValue_outer = @out OUT;
SELECT @out;

Code Snippet 8: Execution of Generated Procedure

When executing, you should get the same result as you see in Figure 1 and Figure 2.

Deploying from R

Above we deployed the created procedure by running the script from SQL Server Management Studio, on the database we wanted to deploy to. You can also deploy straight from R, by using registerStoredProcedure. The signature of registerStoredProcedure looks like in Code Snippet 8:

Signature
registerStoredProcedure(sqlSP,
connectionString = NULL)

Code Snippet 8: Signature of registerStoredProcedure

We see in Code Snippet 8 how registerStoredProcedure takes the generated stored procedure object (created by StoredProcedure), and an optional connection string. It is optional as the connection string can be defined in StoredProcedure.

If we want to deploy to SQL Server directly the code looks like so:

tile: "Deploy"
conStr <- paste0("Driver={ODBC Driver 13 for SQL Server};Server=.;",
"Database=RTest;uid=sa;pwd=secretpassword;")
registerStoredProcedure(irisProc, conStr)

Code Snippet 9: Deploying from R

In Code Snippet 9, we continue from Code Snippet 6, and we have created the stored procedure object irisProc. We create a connection string, and pass irisProc and the connection string to registerStoredProcedure. After executing the code in Code Snippet 9, you should see the procedure in the database.

As you know, by using sp_execute_external_script, we can now in SQL Server 2016 and 2017 execute R scripts (and in SQL 2017 also Python) from inside SQL Server. sp_execute_external_script is a SQL Server extended stored procedure and you pass in the script to execute as well as parameter info to the procedure.

~ Finally

If you have comments, questions etc., please comment on this post or ping me.

Share this Post:

Twitter | Google+ | LinkedIn

Blog Feed:

To automatically receive more posts like this, please subscribe to my RSS/Atom feed in your feed reader!