Microsoft SQL Server R Services - sp_execute_external_script - II


This post is part of a series of blog-posts about Microsoft SQL Server R Services:

  1. Microsoft SQL Server 2016 R Services Installation
  2. Microsoft SQL Server R Services - Internals I
  3. Microsoft SQL Server R Services - Internals II
  4. Microsoft SQL Server R Services - Internals III
  5. Microsoft SQL Server R Services - Internals IV
  6. Microsoft SQL Server R Services - Internals V
  7. Microsoft SQL Server R Services - Internals VI
  8. Microsoft SQL Server R Services - Internals VII
  9. Microsoft SQL Server R Services - Internals VIII
  10. Microsoft SQL Server R Services - Internals IX
  11. Microsoft SQL Server R Services - Internals X
  12. Microsoft SQL Server R Services - Internals XI
  13. Microsoft SQL Server R Services - Internals XII
  14. Microsoft SQL Server R Services - Internals XIII
  15. Microsoft SQL Server R Services - Internals XIV
  16. Microsoft SQL Server R Services - Internals XV
  17. Microsoft SQL Server R Services - Internals XVI
  18. Microsoft SQL Server R Services - Internals XVII
  19. Microsoft SQL Server R Services - Internals XVIII
  20. Microsoft SQL Server R Services - Internals XIX
  21. Microsoft SQL Server R Services - Internals XX
  22. Microsoft SQL Server R Services - sp_execute_external_script - I
  23. Microsoft SQL Server R Services - sp_execute_external_script - II (this post)
  24. Microsoft SQL Server R Services: sp_execute_external_script - III last post in the series

This post is the 23:rd post about Microsoft SQL Server R Services, and a continuation of sp_execute_external_script - I. I honestly thought this post would be the last in the series (I have thought quite a few times), but alas, that is not the case. There will be at least one more post about sp_execute_external_script after this.

Recap

In the last post we started looking at sp_execute_external_script, and we said that sp_execute_external_script is the bridge between SQL Server and external execution engines, like R or Python. The signature for the procedure looks like so:

sp_execute_external_script
sp_execute_external_script
@language = N'language',
@script = N'script',
[ @input_data_1 = ] 'input_data_1'
[ , @input_data_1_name = ] N'input_data_1_name' ]
[ , @output_data_1_name = N'output_data_1_name' ]
[ , @parallel = 0 | 1 ]
[ , @params = ] N'@parameter_name data_type [ OUT | OUTPUT ] [ ,...n ]'
[ , @parameter1 = ] 'value1' [ OUT | OUTPUT ] [ ,...n ]
[ WITH <execute_option> ]
[;]

Code Snippet 1: Full Signature of sp_execute_external_script

In the post we looked at the first five parameters:

We also said the data type of the values of all parameters have to be NVARCHAR.

This post covers the @params and @parameter1 parameters.

Housekeeping

Before we "dive" into today's topics let us look at the code and the tools we use today. This section is here for those who want to follow along in what we are doing in the post.

Code

In this post we use the same database and database objects as in sp_execute_external_script - I, so go there and grab the code if you need to create the database again.

Helper Tools

To help us figure out the things we want, we use Process Monitor, and WireShark:

Now then, let us start.

@params & @parameter1

Yeah, yeah - I know, in Code Snippet 1 above, the @parallel parameter comes before the @params parameter. However, it makes more sense for me to cover @params before @parallel.

The @params parameter is an optional parameter and when defined, it contains the definitions of all parameters embedded in the values for the @input_data_1 and the @script parameters. The string must be either a Unicode constant or a Unicode variable. Each parameter definition consists of a parameter name and a data type. In sp_execute_external_script - I we had code looking like so:

Input Data
EXEC sp_execute_external_script
@language = N'R',
@script = N'
iris_dataset <- InputDataSet
setosa <- iris_dataset[iris_dataset$Species == "setosa",]
meanSepWidth <- mean(setosa$SepalWidth)
cat(paste("Seposa sepal mean width: ", meanSepWidth))',
@input_data_1 = N'SELECT * FROM dbo.tb_irisdata_full'

Code Snippet 2: Using @input_data_1 with straight SELECT

In Code Snippet 2 we SELECT all data from the dbo.tb_irisdata_full table, and then in the script we filter out for the "setosa" species: setosa <- iris_dataset[iris_dataset$Species == "setosa",]. Instead of filtering in the script, our SELECT query can filter: SELECT * FROM dbo.tb_irisdata_full WHERE species = 'setosa', and subsequently the setosa value becomes a parameter: @specie. The query (@input_data_1) then looks like this: SELECT * FROM dbo.tb_irisdata_full WHERE species = @specie. To use this in Code Snippet 2 we now need to define the parameter in @params:

Parameter I
EXEC sp_execute_external_script
@language = N'R',
@script = N'
setosa <- InputDataSet
meanSepWidth <- mean(setosa$SepalWidth)
cat(paste("Seposa sepal mean width: ", meanSepWidth))',
@input_data_1 = N'SELECT * FROM dbo.tb_irisdata_full
WHERE Species = @specie',
@params = N'@specie nvarchar(20)'; ;

Code Snippet 3: Input Data with Parameter Definition

That all looks good, but when you execute you get an error:

Figure 1: Parameter Error

So, the error says that @specie is missing, which makes kind of sense since we have not assigned a value to the parameter. So what about this then:

Parameter II
DECLARE @specie nvarchar(20) = 'setosa';
EXEC sp_execute_external_script
@language = N'R',
@script = N'
setosa <- InputDataSet
meanSepWidth <- mean(setosa$SepalWidth)
cat(paste("Seposa sepal mean width: ", meanSepWidth))',
@input_data_1 = N'SELECT * FROM dbo.tb_irisdata_full
WHERE Species = @specie',
@params = N'@specie nvarchar(20)'; ;

Code Snippet 4: Input Data with Parameter Definition

In the code above (Code Snippet 4) we declare a @specie parameter and give it a value, but when we execute the procedure we get the same error as when we executed the code in Code Snippet 3. What gives, we are obviously missing something here?

What is missing is that we need to define the parameter as part of the procedure and also assign a value to it. Didn't we do that at the top of Code Snippet 4? No, what we did was we DECLARE: ed a new variable which had the same name as the parameter (@specie) but there is no reference to that variable anywhere, so it is not used.

To fix this, this is where @parameter1 comes in (@parameter1 is optional). In fact, there is no @parameter1, but it refers to the first parameter in the @params parameter list. When there is more than one parameter in the @params list, you have @parameter2, and so on. However, you do not refer to @parameter1 as @parameter1, but you name it as per the name in the @params list:

Parameter III
EXEC sp_execute_external_script
@language = N'R',
@script = N'
setosa <- InputDataSet
meanSepWidth <- mean(setosa$SepalWidth)
cat(paste("Seposa sepal mean width: ", meanSepWidth))',
@input_data_1 = N'SELECT * FROM dbo.tb_irisdata_full
WHERE Species = @specie',
@params = N'@specie nvarchar(20)',
@specie = 'setosa'; ;

Code Snippet 5: Defined the Actual Parameter

In Code Snippet 5 we see how the @params parameter lists the actual parameter, @specie, and how @specie is now a named parameter in the procedure. As with any parameters for stored procedures you can declare them as a variable, set the value, and assign them to the actual parameter:

Parameter IV
DECLARE @sp nvarchar(20) = 'setosa'
EXEC sp_execute_external_script
@language = N'R',
@script = N'
setosa <- InputDataSet
meanSepWidth <- mean(setosa$SepalWidth)
cat(paste("Seposa sepal mean width: ", meanSepWidth))',
@input_data_1 = N'SELECT * FROM dbo.tb_irisdata_full
WHERE Species = @specie',
@params = N'@specie nvarchar(20)',
@specie = @sp; ;

Code Snippet 6: Use Parameter and Variable

So that is parameters for the @input_data_1 query. How about parameters in the script, where we want to send in values for those? In sp_execute_external_script - I we had code that looked like this:

Return Data
EXEC sp_execute_external_script
@language = N'R',
@script = N'
iris_dataset <- MyDataSet
setosa <- iris_dataset[iris_dataset$Species == "setosa",]
meanSepWidth <- mean(setosa$SepalWidth)
cat(paste("Seposa sepal mean width: ", meanSepWidth))
multiplier <- 5
iris_dataset$SepalLength <- iris_dataset$SepalLength * multiplier
OutputDataSet <- data.frame(iris_dataset$SepalLength)',
@input_data_1 = N'SELECT * FROM dbo.tb_irisdata_full',
@input_data_1_name = N'MyDataSet'

Code Snippet 7: Return a Result Set

In Code Snippet 7 we see how the script uses the multiplier variable, whose value is hardcoded in the script. What if we want to send in a value to the script when the procedure runs? Well, we define a parameter for the variable and add it to the @params list, and then we name the parameter in the procedure and assign it a value. Merging Code Snippet 6 with what we see in Code Snippet 7, we get something like so:

Parameter V
DECLARE @mult int = 5;
DECLARE @sp nvarchar(20) = 'setosa'
EXEC sp_execute_external_script
@language = N'R',
@script = N'
setosa <- MyDataSet
meanSepWidth <- mean(setosa$SepalWidth)
cat(paste("Seposa sepal mean width: ", meanSepWidth))
multiplier <- multip
setosa$SepalLength <- setosa$SepalLength * multiplier
OutputDataSet <- data.frame(setosa$SepalLength)',
@input_data_1 = N'SELECT * FROM dbo.tb_irisdata_full
WHERE Species = @specie',
@input_data_1_name = N'MyDataSet',
@params = N'@specie nvarchar(20), @multip int',
@specie = @sp,
@multip = @mult;

Code Snippet 8: Multiple Parameters

What we see in Code Snippet 8 is how the @multip parameter is defined in the @params list and added to the stored procedure. We have also created a variable, @mult, which we have set a value for, we have then assigned the variable to the parameter.

NOTE: When you refer to the parameters in the script file, you refer to them without the ampersand, so @multip becomes multip and so forth.

One more thing, we have now seen how parameters are used to assign to a query as well as pass into the script. When we look at the code in Code Snippet 8 we see how we:

In a "real-world" scenario, you would not print out the variable, but instead wanting it passed back to the calling code. As with any code in SQL Server, when we want data passed back (data that is not resultsets that is), we use output parameters. By now we should know the "drill"; add a parameter to the @params list, mark it as OUTPUT and then add the parameter to the procedure:

Output
DECLARE @meanOut float;
DECLARE @mult int = 5;
DECLARE @sp nvarchar(20) = 'setosa'
EXEC sp_execute_external_script
@language = N'R',
@script = N'
setosa <- MyDataSet
meanSepWidth <- mean(setosa$SepalWidth)
multiplier <- multip
setosa$SepalLength <- setosa$SepalLength * multiplier
OutputDataSet <- data.frame(setosa$SepalLength)',
@input_data_1 = N'SELECT * FROM dbo.tb_irisdata_full
WHERE Species = @specie',
@input_data_1_name = N'MyDataSet',
@params = N'@specie nvarchar(20), @multip int, @meanSepWidth float OUT',
@specie = @sp,
@multip = @mult,
@meanSepWidth = @meanOut OUT;
SELECT @meanOut AS MeanSepWidth;

Code Snippet 9: Use of Output Parameter

In Code Snippet 9 we see how the @meanSepWidth parameter is defined to the @params list as OUTPUT (OUT works as well) and added to the procedure. When we execute the code we get two resultsets coming back, one from the OutputDataSet <- data.frame(setosa$SepalLength) and the second from SELECT @meanOut AS MeanSepWidth:

Figure 1: Resultsets Coming Back

The last thing to discuss related to parameters are how it works internally, similar to how we in sp_execute_external_script - I discussed how the script knew about the input and output datasets. Here we discuss how the script and the engine know about parameters and their values. Oh, in [Internals - XVIII] we discussed the mechanism with which the value(s) of the output parameter(s) come back, so, therefore, we do not cover return values of output parameters in this post.

To begin with, in what packets are the parameters passed? To find this out, we use some much-simplified code from what we have used so far:

Text
EXEC sp_execute_external_script
@language = N'R',
@script = N'
d <- 42
cat(paste("Answer:", d))';
EXEC sp_execute_external_script
@language = N'R',
@script = N'
d <- multip
cat(paste("Answer:", d))',
@params = N'@multip int',
@multip = 42;

Code Snippet 10: Execution with and without Parameters

As we see in Code Snippet 10 there are two instances of sp_execute_external_script, both are very similar, and both generate the same result. What differs is that the first one has the value 42 hard-coded where the other pass the value via a parameter. The idea is now to initially to use Process Monitor and compare the packets SQL Server sends to the SqlSatellite to see if we can figure out what packet passes the parameter(s). So:

When we execute the two statements in Code Snippet 10, Process Monitor outputs this:

Figure 2: Packets Sent

The black line in Figure 2 separates the two executions, and we see how the third packet in each run differs in size (outlined in blue and red respectively). The packet outlined in red is bigger and comes from the second run - where we used the @multip parameter. The third packet is the packet where SQL Server sends the script to the SqlSatellite, and from what we can see so far it looks like some parameter information is part of this packet. Let us switch over to WireShark and see what goes on.

Execute the code in Code Snippet 10 where we use the @multip parameter and look at the output from WireShark. In the output is a packet with the same length as the packet outlined in red in Figure 2, let us have a look at the data part of that packet:

Hex Dump I
01 00 05 80 25 01 00 00 e3 01 86 81 e1 a2 2e 43 ....%... .......C
b8 e0 a7 31 0d d1 83 dd 00 00 01 00 00 00 01 00 ...1.... ........
00 00 00 00 1a 00 49 00 6e 00 70 00 75 00 74 00 ......I. n.p.u.t.
44 00 61 00 74 00 61 00 53 00 65 00 74 00 00 00 D.a.t.a. S.e.t...
1c 00 4f 00 75 00 74 00 70 00 75 00 74 00 44 00 ..O.u.t. p.u.t.D.
61 00 74 00 61 00 53 00 65 00 74 00 00 00 30 00 a.t.a.S. e.t...0.
00 00 30 00 00 00 01 00 03 00 00 00 00 00 00 00 ..0..... ........
12 00 45 00 78 00 70 00 72 00 31 00 30 00 30 00 ..E.x.p. r.1.0.0.
30 00 00 00 38 00 00 00 38 00 00 00 04 00 0a 00 0...8... 8.......
00 00 00 00 00 04 00 00 00 2a 00 00 00 10 00 40 ........ .*.....@
00 6d 00 75 00 6c 00 74 00 69 00 70 00 00 00 70 .m.u.l.t .i.p...p
00 00 00 01 80 0d 00 0a 00 20 00 20 00 20 00 20 ........ . . . .
...

Code Snippet 11: *Hex Dump of Data Packet"

We see in Code Snippet 11 the first 192 bytes of the packet containing the script which SQL Server sent to the SqlSatellite. In the text representation of the hex, we can see the parameter name (@multip), and it starts at byte 160 (the first byte in hex is 40). So parameters are sent to the engine via the packet containing the script. What about the value of the parameter?

The value of the parameter is somewhat harder to figure out, seeing the value is an int and therefore not in clear-text. However, if we look closely at the hex, we see at byte 154 a hex value of 2a which in decimal is 42. So SQL Server passes the value(s) of the parameter(s) as well in the "script" packet. In this case, it was one parameter with one value, what happens if we have multiple parameters and values, for example, something like this:

Multiple Params
EXEC sp_execute_external_script
@language = N'R',
@script = N'
d <- multip1 * multip2
cat(paste("Answer:", d))',
@params = N'@multip1 int, @multip2 int',
@multip1 = 42,
@multip2 = 5;

Code Snippet 12: Script using Multiple Parameters

In the code in Code Snippet 12 we send two parameters to the script (@multip1 and @multip2) and the hex-dump we get after we execute the code looks like so:

Hex Dump II
01 00 05 80 6a 01 00 00 b8 b1 da 7e cc d0 4e 40 ....j... ...~..N@
90 95 c1 e3 e0 58 dd 7e 00 00 01 00 00 00 02 00 .....X.~ ........
00 00 00 00 1a 00 49 00 6e 00 70 00 75 00 74 00 ......I. n.p.u.t.
44 00 61 00 74 00 61 00 53 00 65 00 74 00 00 00 D.a.t.a. S.e.t...
1c 00 4f 00 75 00 74 00 70 00 75 00 74 00 44 00 ..O.u.t. p.u.t.D.
61 00 74 00 61 00 53 00 65 00 74 00 00 00 30 00 a.t.a.S. e.t...0.
00 00 30 00 00 00 01 00 03 00 00 00 00 00 00 00 ..0..... ........
12 00 45 00 78 00 70 00 72 00 31 00 30 00 30 00 ..E.x.p. r.1.0.0.
30 00 00 00 38 00 00 00 38 00 00 00 04 00 0a 00 0...8... 8.......
00 00 00 00 00 04 00 00 00 2a 00 00 00 38 00 00 ........ .*...8..
00 38 00 00 00 04 00 0a 00 00 00 00 00 00 04 00 .8...... ........
00 00 05 00 00 00 12 00 40 00 6d 00 75 00 6c 00 ........ @.m.u.l.
74 00 69 00 70 00 31 00 00 00 12 00 40 00 6d 00 t.i.p.1. ....@.m.
75 00 6c 00 74 00 69 00 70 00 32 00 00 00 86 00 u.l.t.i. p.2.....
00 00 01 80 0d 00 0a 00 20 00 20 00 20 00 20 00 ........ . . . .

Code Snippet 13: Hex Dump from Multiple Parameters

In Code Snippet 13 we see the first 240 bytes of the hex-dump from Code Snippet 12. When I saw this dump for the first time I was surprised to see that the parameter names are together. I thought they would appear together with the values. However as we clearly can see the values come first. We see hex 2a (decimal 42) at byte 154, and hex 05 (decimal 5) at byte 179 and the first parameter name starts at byte 185 (hex 40). In essence, the parameter values come before the parameter names.

When we look at the two hex-dumps, there is something else worth noticing. If we look closely at where the values are and the parameter names we see how:

After doing quite a few tests, I believe (almost 100% certain) that the value before the parameter value indicates the data size of the actual parameter value. In code snippets 11 and 13 that value is hex 04, which is decimal 4 - the size of an integer. The value in front of the parameter name tells us the length of the parameter name in double-byte. In Code Snippet 11 the value is hex 10 which is decimal 16, and that is the double-byte length of the parameter @multip. In Code Snippet 13 the value is hex 12 before each parameter. Hex 12 is decimal 18, which is the double-byte length of @multip1 and @multip2.

Summary

Phew, that was quite a lot. So :

~ Finally

If you have comments, questions etc., please comment on this post or ping me.

Share this Post:

Twitter | Google+ | LinkedIn

Blog Feed:

To automatically receive more posts like this, please subscribe to my RSS/Atom feed in your feed reader!