Microsoft SQL Server R Services - Internals VIII


This post is part of a series of blog-posts about Microsoft SQL Server R Services:

  1. Microsoft SQL Server 2016 R Services Installation
  2. Microsoft SQL Server R Services - Internals I
  3. Microsoft SQL Server R Services - Internals II
  4. Microsoft SQL Server R Services - Internals III
  5. Microsoft SQL Server R Services - Internals IV
  6. Microsoft SQL Server R Services - Internals V
  7. Microsoft SQL Server R Services - Internals VI
  8. Microsoft SQL Server R Services - Internals VII
  9. Microsoft SQL Server R Services - Internals VIII (this post)
  10. Microsoft SQL Server R Services - Internals IX
  11. Microsoft SQL Server R Services - Internals X
  12. Microsoft SQL Server R Services - Internals XI
  13. Microsoft SQL Server R Services - Internals XII
  14. Microsoft SQL Server R Services - Internals XIII
  15. Microsoft SQL Server R Services - Internals XIV
  16. Microsoft SQL Server R Services - Internals XV
  17. More to come (hopefully)

This post is the ninth post about Microsoft SQL Server R Services, and the eight post that drills down into the internal of how it works.

So far in this series we have been looking at what happens in SQL Server as well as the launchpad service when we execute sp_execute_external_script, and we have still no real "clue" to where the R engine comes into play.

Well, hopefully that will change (at least a little bit) with this post, as we here will look at what happens when we leave the launchpad service.

Before we dive into the "juicy" stuff, let's remind ourselves where we are.

Recap

Normally in the recaps in this series, we have looked at what was covered in the previous "episode". In this recap, let's look at he "full" picture up until now.

The first post in the series - Microsoft SQL Server 2016 R Services Installation - covered the installation of SQL Server 2016 R Services, and it also touched upon the external procedure which allows us to execute external scripts; sp_execute_external_script. We looked at the signature of the procedure as well as executing the equivalent to a "Hello World" script.

In the subsequent posts we talked about - when executing sp_execute_external_script - how SQL Server calls into the launchpad service, and how the launchpad service - through the rlauncher.dll creates multiple Rterm.exe processes as in Figure 1 below. One of the processes will be used to run the external script:

Figure 1: Rterm.exe Processes

I addition to creating Rterm.exe processes, the launchpad service also creates backing directories for those processes. These backing directories are used for saving output, intermediate results etc. The following figure was used to show what the call flow looks like:

Figure 2: Call Flow Executing sp_execute_external_script

We discussed how the number of processes can be controlled by the PROCESS_POOL_SQLSATELLITE_GROWTH setting in rlauncher.config file, and how it defaults to 5 if nothing is set.

In Internals - VI we came back to the backing directories, and we realized that in addition to the backing directories created for the Rterm processes, one more directory is created - which will be the "official" working directory for the session, and we showed this using this figure:

Figure 3: Launchpad, Directories and Processes

While we were investigating the directories created in Internals - VI we saw that, while we executed an external script, files and sub-directories were created in the various backing directories:

Figure 4: Contents Process Directory

In Figure 4 we see he content of the directory which is the processing directory, and in Internals - VII we looked into what creates those files/directories and what they are for. We came to the conclusion that both the launchpad service (probably through the rlauncher.dll) created some files, whereas Rterm.exe created others.

That's where we are, and now it is time to look at the world outside of the launchpad, and try to figure out what is being used when executing an external script.

Rterm.exe

In Internals - III we saw how the launchpad service creates Rterm processes, and how the launchpad service calls launchpad!CSQLSatelliteCommunication::SendResumeWithLoginInfo followed by launchpad!CSQLSatelliteConnection::WriteMessage to call into the Rterm process chosen for execution of the external script. So what does Rterm.exe and the RTerm process do?

To answer that let us first look at what the R "engine" really is, and let's do that by looking at standard CRAN R, and for this I use RStudio as an IDE. So, let's open RStudio and make sure it uses the CRAN R engine:

Figure 5: RStudion CRAN R

At that stage if I check in Process Explorer to see what processes are loaded I see something like this:

Figure 6: **

So in Figure 6 we see how the rstudio.exe is host for rsession.exe, which in turn host conhost.exe. For this discussion conhost.exe has no real impact, but if you want to know more about it, here is a good article explaining why we have conhost.exe.

Going back to rsession.exe; in Process Explorer, we look at what dll's it has loaded (click on `rsession.exe followed by Ctrl-D ) we'll see something like this:

Figure 7: Loaded dll's

Figure 7 above shows 7 dll's loaded by rsession.exe, and what is interesting is the second from bottom, which has a path pointing to the R libraries. When we look further down in the list we'll see more dll's from the R libraries:

Figure 8: R Dlls

Among the dll's we see the R.dll, which is the main dll for R. So, it seems that rsession.exe acts as a host for the R engine, and the R engine is a "bunch" of dll's.

Coming back to rterm.exe and what it is, we can now assume it acts in the same way as rsession.exe. Let us go back to the code that we used in Internals - III when we captured the figure we above in Figure 1:

Script with Pause
EXEC sp_execute_external_script
@language =N'R',
@script=N'OutputDataSet<-InputDataSet;
Sys.sleep(120);',
@input_data_1 =N'SELECT 42'
WITH RESULT SETS (([TheAnswer] int not null));

Code Snippet 1: Execute with Sys.Sleep

Like we did in Internals - III, we'll execute the code in Code Snippet 1, and while the code is running we'll look in Process Explorer and see if we find anything of interest:

  1. Stop the launchpad service.
  2. Restart the launchpad service.
  3. Start Process Explorer, order by Process, and scroll down to where you see process names starting with "RT" (on my box there are none at this stage), or where the processes should be.
    • If you at this stage see RTerm, restart the launchpad service again and kill those processes.
  4. Execute the code in Code Snippet 1.

While the code is running, take a quick look in Process Explorer, and you should see something like so:

Figure 9: RTerm Processes

The difference between Figure 1 and Figure 9, is that in Figure 1 I had collapsed the Rterm processes. In Figure 9 we see how the launchpad service is host for multiple Rterm processes, and how Rterm.exe is host for conhost.exe as well as BxlServer.exe. We'll come back shortly to what BxlServer.exe is.

Let's look a bit closer at what dll's we find for one of the active Rterm processes, click on the Rterm process which has a CPU value, and then click Ctrl-D.

NOTE: You may have to re-execute the code in Code Snippet 1, maybe even increase the sleep time to 3 or 4 minutes to be able to see everything.

When I scroll down in the dll list I see some familiar R dll's, that we also saw in Figure 8:

Figure 10: Rterm R dll's

At this stage we can safely say that the Rterm.exe and its process hosts the R engine. That is cool (yeah I know - I need to get a life), but what does that BxlServer.exe do in the mix?

BxlServer

So the question was, what is the BxlServer.exe? To answer that let's take a step back, and look at the R offerings Microsoft has:

Let's see what happens if we pointed RStudio towards the Microsoft R Open environment:

Figure 11: RStudio Targeting MRO

In Process Explorer, looking at the rstudio process, it won't look any different than what it did when Rstudio targeted CRAN R as in Figure 6, so obviously the Microsoft R Open offering does not "know" of the BxlServer.exe we see in Figure 9. But, looking at the dll's loaded by rsession.exe, we see more or less the same as was loaded by rterm.exe in Figure 10:

Figure 12: RStudio MRO dll's

What would happen if RStudio targeted the Microsoft R Server (MRS) instead:

Figure 13: RStudio Targeting MRS

This time when looking at the rstudio process we do see BxlServer.exe as a process hosted by rsession.exe, so it seems that BxlServer.exe is specifically for the commercial versions of Microsoft R.

NOTE: There is no difference between the BxlServer.exe in MRS and SQL Server R Services. Well, not much anyway :), in next post we'll see some differences.

By now you are probably saying something like: "Niels, this is very informative - NOT, get to the point - what is BxlServer.exe". OK, so let us see some of the dll's BxlServer.exe has loaded (click on the BxlServer.exe process and Ctrl-D):

Figure 13: BxlServer Loaded dll's

Ooh, that's interesting, look at the path and especially what is in the red rectangle: RevoScaleR.

RevoScaleR, is an R package providing both High Performance Computing (HPC) and High Performance Analytics (HPA) capabilities for R. HPC capabilities allow you to distribute the execution of essentially any R function across cores and nodes, and deliver the results back to the user. HPA adds the ability to handle big data in a high performance way. In addition to this (or rather as part of) RevoScaleR also has functionality to handle data access with high performance, and BxlServer.exe is the host for all this.

In addition to run the HPC, HPA and data access, BxlServer.exe also coordinates with the R runtime in order to manage exchanges of data with SQL Server, and it runs most of the R processing. A lot of the coordination and handling of result sets in BxlServer.exe is handled by a dll loaded by BxlServer.exe: BxServerLink.dll. Below we see how the BxServerLink.dll is loaded by Bxlserver.exe.

Figure 14: BxServerLink Loaded by BxlServer.exe

Let's go back to Figure 13 , where we see how rsession.exe hosts BxlServer.exe. How does that work, I mean - rsession.exe is a CRAN R executable, and it does not know anything about Microsoft R Services and BxlServer.exe? The answer is R.dll; when rsession.exe loads R.dll (which is a Microsoft dll), the R.dll then starts to load other dll's. But, BxlServer.exe is not a dll, and it runs in a separate process, surely R.dll cannot do that? No, it cannot - but what it does do is to load a dll acting as a conduit: the RxLink.dll. The RxLink.dll creates a new process in where BxlServer.exe runs, and that is how communication happens between rsession.exe and BxlServer.exe. And that is also what happens in SQL Server R Services where Rterm.exe is the "host". Rterm.exe loads the R.dll which loads RxLink.dll who then creates a process for BxlServer.exe, and BxlServer.exe loads BxLinkServer.dll.

NOTE: In the next Internals post we'll look more at RxLink.dll, and BxLinkServer.dll.

Finally, let's make sure that BxlServer.exe actually is doing something when we execute an external script in SQL Server R Services. For that we'll use some code we wrote in Internals V, where we discussed parallelism:

Setup
USE master;
GO
SET NOCOUNT ON;
GO
DROP DATABASE IF EXISTS TestParallel;
GO
CREATE DATABASE TestParallel;
GO
USE TestParallel;
GO
DROP TABLE IF EXISTS dbo.rand_30M
CREATE TABLE dbo.rand_30M(RowID bigint identity primary key, y int,
rand1 int, rand2 int, rand3 int, rand4 int,
rand5 int);
INSERT INTO dbo.rand_30M(y, rand1, rand2, rand3, rand4, rand5)
SELECT TOP(30000000) CAST(ABS(CHECKSUM(NEWID())) % 14 AS INT)
, CAST(ABS(CHECKSUM(NEWID())) % 20 AS INT)
, CAST(ABS(CHECKSUM(NEWID())) % 25 AS INT)
, CAST(ABS(CHECKSUM(NEWID())) % 14 AS INT)
, CAST(ABS(CHECKSUM(NEWID())) % 50 AS INT)
, CAST(ABS(CHECKSUM(NEWID())) % 100 AS INT)
FROM sys.objects o1
CROSS JOIN sys.objects o2
CROSS JOIN sys.objects o3
CROSS JOIN sys.objects o4;
GO

Code Snippet 2: Setup of Database, Table and Data

The data we create in Code Snippet 2 makes absolutely no sense at all, but it will serve our purposes of have some volume of data to work with, and cause the execution to run for a little while.

NOTE: Go back to Internals - V if you are interested in seeing what this code does.

The code we want to execute in SQL Server looks like so:

Execute from SQL Server
EXEC sp_execute_external_script
@language = N'R'
, @script = N'
#Sys.sleep(30)
pid <- Sys.getpid()
r <- rxLinMod(y ~ rand1 + rand2 + rand3 + rand4 + rand5,
data=InputDataSet)
coef <- r$coefficients
icept <- coef[1];
OutputDataSet <- data.frame(pid=pid, nRows=r$nValidObs,
intercept=icept)'
, @input_data_1 = N'
SELECT y, rand1, rand2, rand3,
rand4, rand5
FROM dbo.rand_30M'
WITH RESULT SETS ((pid INT NOT NULL, nRows INT NOT NULL, intercept FLOAT NULL)
);

Code Snippet 3: Linear Regression in SQL Server

To run this:

Among the Rterm and BxlServer processes you should see one BxlServer.exe process that has a CPU value, as well as consuming some memory:

Figure 15: BxlServer Processing

So, as I mentioned above, it is the BxlServer.exe which is doing he heavy lifting in SQL Server R Services.

Summary

In this post we have now seen what happens when executing sp_execute_external_script:

Below I have tried to describe it in a figure:

Figure 15: BxlServer Processing

Notice how, in Figure 15, there is nothing about how data gets back to SQL Server. That is something we will discuss in next post. Oh and yes, Figure 15 looks very much like one of the figures you can see in this post, and I freely admit I have looked at it :).

~ Finally

If you have comments, questions etc., please comment on this post or ping me.

Share this Post:

Twitter | Google+ | LinkedIn

Blog Feed:

To automatically receive more posts like this, please subscribe to my RSS/Atom feed in your feed reader!