Creating R Stored Procedures in SQL Server 2016 Using sqlrutils
In the Microsoft R Server forum the other day was a question about inputs to SQL Server R stored procedures, or rather about "strange" parameter naming, when using the sqlrutils package. I got intrigued by the question and started
playing around with researching it, and this blog-post is the result. Oh, and this blog-post would never have seen the light of the day, if it hadn't been for the original poster of the question: JD Long, thanks JD!
So, what is SQL Server R stored procedures and what is sqlrutils?
Before we answer the question above, let's do a quick recap of
sp_execute_external_script is an extended stored procedure, and it allows us to execute R scripts (and in SQL 2017 also Python) from inside SQL Server. You use it by passing it the script you want to execute as well as various parameters. Let's assume you have some R code looking like so:
Code Snippet 1: R Script
As you see in Code Snippet 1, we:
- Set the value of the
- Reads in the
irisdata into a data frame:
- Create a new data frame for the
- Calculate the
setosaand assigns that to a variable:
- Multiply the
Sepal.Lengthproperty with the
- Finally output the calculated property, and print out the
Yes, yes, I know - the code is very, very simplistic, but it will still get the points across.
NOTE: If you haven't heard about the
irisdataset before, you can read more about it here.
To use this script in SQL Server with
sp_execute_external_script, you would write something like this:
Code Snippet 2: The iris Script in SQL Server
As you see from Code Snippet 2, the R script is passed in to the
@script parameter of
sp_execute_external_script. The actual script looks almost identical to what is in Code Snippet 1, except for instead of the
View(iris_dataset$Sepal.Length), we use
OutputDataSet variable is a well known variable in SQL Server R Services, and it is the default variable that will contain the data to be returned to SQL Server as a result-set upon completion of the stored procedure call. The layout of the result-set is defined in the
WITH RESULT SETS statement, and in Code Snippet 2 above we are interested in one column, of type
float and we name it
The parameters we use are defined in the
@params parameter, followed by the parameters themselves. In this case
@multiplier which we set to 5, and
@menSepWidth which is
OUTPUT. If you have used
sp_executesql the parameter syntax should be familiar to you. We assign the (at the top of the script) declared
@out_val variable to the
@menSepWidth output parameter, and at the end of the script we
SELECT it out.
When running Code Snippet 2, the output will be like in the two figures below:
Figure 1: Result-set
Figure 1 shows the output from
OutputDataSet, the calculated
Figure 2: Output Parameter
In Figure 2 we see the output parameter;
This is nice, being able to execute
sp_execute_external_script, and run our R scripts in SQL Server. However how do I operationalize this, I can't really be running these scripts every time I want to "crunch" some data?! Well, the answer to that is to wrap the code we see in Code Snippet 2 in a stored procedure, which would take the necessary parameters, and then execute
sp_execute_external_script inside the procedure.
To make it easier to create wrapping procedures, Microsoft has an R package to help with this: sqlrutils.
The sqlrutils package allows R users to, from inside an R IDE of choice, put their R scripts into a T-SQL stored procedure, register that stored procedure with a database, and run the stored procedure from an R development environment.
Let us see what the package does. To do that run from inside an R IDE, I am using R Tools for Visual Studio 2017 (RTVS),
NOTE: Before you run the
helpstatement above, you may have to run:
library(sqlrutils), to load the package.
When running `help(package="sqlrutils"), the httpd help server starts and you should see something like so:
Figure 3: Output from Help
In Figure 3 you see the various function the package exposes. Now, let's see what we can do with it.
It is considered best practices to rewrite your R script as a function, and pass necessary parameters to the function. So we take our script and turn it into a function:
Code Snippet 3: Create Function
In Code Snippet 3 we create the function and we:
- pass in the
- declare the
menSepWidthvariable inside the function
iris_dataset$Sepal.Lengthto a data frame variable
- create a named list, containing the data frame as well as
- return the list
The function will eventually be part of the script inside
sp_execute_external_script, but before we go there we should define the required parameters for the procedure.
In this example we have three parameters / objects:
- the output dataset
When looking at the output from the
help as in Figure 3, we see some functions that looks like doing what we want to do:
So to create our parameters we write some code like in Code Snippet 4:
Code Snippet 4: Parameter Creation
Notice how the output dataset, and the output parameter are named as the names in the list. This is important as we will see in a little bit. You can have multiple parameters (input/output), but only one output data set. If you have multiple parameters you define them as in Code Snippet 4.
Now when the parameters are defined, we can create the stored procedure.
To create a stored procedure we use the
StoredProcedure class generator, and the signature of
StoredProcedure looks like so:
Code Snippet 5: Signature for StoredProcedure Class Generator
The arguments in the signature are:
func- the function name, of the function we want wrapped.
spName- name for the generated stored procedure
...- optional input and output parameters (including output dataset)
filePath- optional path to where to create and save a .sql source file.
dbName- optional name specifying the database to use in the script file.
connectionString- optional connection string.
batchSeparator- what batch separator to use.
So, let's see what we should do to create a stored procedure based on what we have done so far (the code snippet below includes the parameter definitions from *Code Snippet 4):
Code Snippet 6: Generate Stored Procedure from R Function
In Code Snippet 6 we:
- define the function to use is the function assigned to the
irisFuncvariable from Code Snippet 3.
- say that we want the created procedure to be named
- define the input and output parameters though the variables we have created. If you have defined more parameters (as we spoke about above), you add them to the
- set the file path where we want the source files to be created.
We skip the rest of the arguments, and when we run the code in Code Snippet 6, the source file looks like so:
Code Snippet 7: Generated Stored Procedure
That looks OK, a couple of things to point out:
- Parameter names are appended with _outer. This is what JD Long asked about; where comes that from? It seems to be the convention to do that, to separate the parameters for
sp_execute_external_scriptfrom the parameters for the outer procedure. Hence the _outer in the name.
- What's this
@parallel_outerparameter, we did not define it anywhere? In
sp_execute_external_scriptyou have the ability to define that you wish the execution of the R script should happen in parallel, and
@parallelis an existing parameter on
sp_execute_external_script. So by default, this parameter is always created (and defaulted to 0).
- We then see the function definition as part of the
@scriptparameter, together with generated code to execute and retrieve the results. This is where it is important that you return a named list from the function, especially if you have more than one output parameter (including the output dataset). The names in the list should be the same as you set in your parameter definitions.
You can now take the source file and deploy the procedure to a database of your choice. Then you execute the procedure as so:
Code Snippet 8: Execution of Generated Procedure
When executing, you should get the same result as you see in Figure 1 and Figure 2.
Deploying from R
Above we deployed the created procedure by running the script from SQL Server Management Studio, on the database we wanted to deploy to. You can also deploy straight from R, by using
registerStoredProcedure. The signature of
registerStoredProcedure looks like in Code Snippet 8:
Code Snippet 8: Signature of registerStoredProcedure
We see in Code Snippet 8 how
registerStoredProcedure takes the generated stored procedure object (created by
StoredProcedure), and an optional connection string. It is optional as the connection string can be defined in
If we want to deploy to SQL Server directly the code looks like so:
Code Snippet 9: Deploying from R
In Code Snippet 9, we continue from Code Snippet 6, and we have created the stored procedure object
irisProc. We create a connection string, and pass
irisProc and the connection string to
registerStoredProcedure. After executing the code in Code Snippet 9, you should see the procedure in the database.
As you know, by using
sp_execute_external_script, we can now in SQL Server 2016 and 2017 execute R scripts (and in SQL 2017 also Python) from inside SQL Server.
sp_execute_external_script is a SQL Server extended stored procedure and you pass in the script to execute as well as parameter info to the procedure.
If you have comments, questions etc., please comment on this post or ping me.
Share this Post:Twitter | Google+ | LinkedIn