Installing R Packages in SQL Server Machine Learning Services - I
This post was supposed to be a single post about how to install R packages in SQL Server Machine Learning Services, but once again I completely misjudged the scope of the topic. So this one post turned into a series of posts about how to install R packages in SQL Server Machine Learning Services and this is the first post in the series.
To see other posts in the series go to Install R Packages in SQL Server ML Services Series.
As you may know, I am in the process of writing the follow-up post to sp_execute_external_script and SQL Compute Context - I. However, I have a hard time getting into the flow of things, so I use any excuse I can, to not have to write. So when Dane Bax, a colleague of mine, contacted me a couple of days ago with a SQL Server Machine Learning Services problem, I jumped at the chance to help him, and also write a post about it.
His problem was that he wanted to use a CRAN package -
bsts - which is not part of a standard SQL Server R installation. He tried a couple of things to get it installed but got errors, so he decided to get in touch with me.
NOTE: The package name
bstsstands for Bayesian Structural Time Series and it performs time series regression using dynamic linear models fit using MCMC.
First of all; why would we need to install R packages if I already have R, either on my local machine or via SQL Server ML Services? Well, the answer to that is that there are a multitude of packages "in the wild" who do not necessarily come with your R engine of choice, and
bsts is an excellent example of this.
If you are an R developer you are probably accustomed to installing packages on your R development environment at will, and - more or less - at whatever location you choose. When using SQL Server ML Services however, it does not work like that as SQL Server cannot load packages from external libraries, even if that library is on the same computer. So when using SQL Server ML Services, you can only install packages to a default library associated with the instance.
The installation of packages can be done in different ways which is what this post is about - but before that, let us look at something somewhat different: Rtools.
The cool thing with R is that it is open source and you can run it on multiple platforms (Windows, Mac, Linux). So in essence, whatever package you want to use you can run on your platform of choice. If you install a package on Mac or Windows, R downloads and installs a pre-compiled (for your OS) packet. On Linux, R downloads the source of the package and builds it on your machine. For the build, R requires some external tools:
gzip, C/C++ compiler and so forth.
Why I mention this here is that certain packages do not have a binary built for Windows, and if you want to install such a package, you need to build the package from source on your environment. The problem with this is that most of the tools needed to build the package may not exist in the Windows environment.
To be able to compile from source on Windows, the people behind R have made available an installer which installs the required tools for compilation of packages: Rtools. So if you think that you ever need to compile an R package from source, then ensure that you have Rtools installed. Why I bring this up is that the
bsts package has a dependency on a package that needs to be compiled.
While you can install R packages to a remote machine, keep in mind that Rtools is not an R package as such, and it is not installed into the engine - but onto the machine where R is. So, for us here, we need to be on the machine that hosts SQL Server ML Services and run the installation on that machine.
We need first to download the installer for Rtools, and when you browse to the download page, you see there are multiple versions dependent on what version of R you have. To find the R version of your SQL Server ML Services installation you can run following code:
Code Snippet 1: Retrieve R Version
When I run the code in Code Snippet 1 on my SQL Server ML Services instance, I see that the R version is
3.3.3, so I download
Rtools35.exe to my SQL Server machine and run the installer. By default Rtools installs to
C:\RTools and R looks for compilers in the default installation path. If you install anywhere else, you have to point R to the path of
ld, by setting a variable called BINPREF. Rtools installation instructions discuss this in detail. During the install ensure you check the checkbox for "Add rtools to system PATH":
Figure 1: Adding Rtools to PATH
After having checked the box for editing the PATH as in Figure 1, click through and let the install finish. After installation, it is a good practice to check that the PATH is set. You can do this by running
RTerm.exe (on the SQL Server box) and execute
Sys.getenv('PATH') from RTerm's command prompt. You find
RTerm.exe at the
R_SERVICES\bin\x64 directory under the path to the SQL Server instance. For example:
C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\R_SERVICES\bin\x64.
NOTE: Just a word of caution here. When I installed
Rtools35.exeI had to manually add to PATH the path to the compilers:
C:\Rtools\mingw_64\bin. So look out for that .
You should also check that you can call the C++ compiler:
system('g++ -v') (this is how I realised the path was not correct). That should result in something like:
Figure 2: Checking for C++ Compiler
If everything looks OK, Rtools have now been installed, and the various instances of R (from SQL Server ML instances) can share the Rtools toolchain.
R Packages Installation
When we install R packages for SQL Server ML Services we install them on a per SQL Server instance, and we can install these packages different in ways:
- R packet managers.
Regardless how we install the packages, they can only be installed to the default packet library for that instance. The file system folder for this library has restricted access and to write to this folder you need admin rights. Well, that is not entirely correct - with some configuration changes even non-admin can install packages via T-SQL and RevoScaleR. However, as we see later, the installation is against the current database.
R Code for Installation
Before we look at the ways we can install and the tools for installation; what does the code we use to do the installation look like?
As you probably know, the way to install R packages is through the
install.packages command. The command has quite a few parameters as you can see here, but when I install packages I use only a few of the parameters, regardless of the way I install the package:
Code Snippet 2: Install Packages Command
In Code Snippet 2 we see how I first retrieve the library path. This is where I install the package to, and once again - I can only install to the default instance directory. Then in the
install.packages call I use these parameters:
- First parameter is always the name of the package(s) to install.
lib: is for the library folder to install to.
repos: the base URL(s) of the repositories to use. If left out, the repo used is the Microsoft MRAN repo, which may not be what you want.
dependencies: indicating whether to also install missing packages which these packages depend on/link to/import/suggest (and so on recursively).
So that is the code for installation of packages. What if you want to see what packages are installed on a particular instance of SQL Server ML Services? For that you can execute something like so from SQL Server Management Studio (SSMS):
Code Snippet 3: Retrieve Installed R Packages
So now when we have seen some code how to install packages (and also see what packages already exists), in this post let us look at using R packet managers for the installation.
R Packet Managers
What is an R packet manager? It is an R command line tool or GUI installed on the SQL Server Machine Learning Services machine that can run with elevated permissions and target the R engine for the instance on which you want to install the package. The easiest is to use either of the R tools that come as part as part of SQL Server's R service:
- The command line tool:
- The GUI:
Once again you need to be able to run them with elevated access, so you need admin rights on the machine, and they can only run locally.
So let us say that Dane (from above) wants to install the
bsts package mentioned above and he has admin rights on the machine SQL Server is installed on. The choice he has is to use
Rgui.exe. Dane is not really into command line, so he uses
- He logs onto the SQL Server machine either directly or via Terminal Services.
- He navigates to where
Rgui.exeis (the same path as above for
- He right clicks on
Figure 3: Run Rgui as Admin
When he clicks on "Run as administrator" the Rgui application starts up in the context of the R engine of the instance of SQL Server ML Services:
Figure 4: Rgui
In the R Console in Rgui Dane now enters this code:
Code Snippet 4: Install bsts
In Code Snippet 4 we see that Dane uses the open source CRAN repo which hosts
bsts. When he executes the code, it looks like so:
Figure 4: Execute install.packages
At the highlighted question in Figure 4 (at the bottom) it is best to answer no. Even though Dane said no to compilations, quite a lot of compilations happens for the
Figure 5: C++ Compilation
The installation process runs for quite a while, due to the compilation of the
Boom package, but eventually, it finishes:
Figure 6: Install Success
Dane can now check and see if the
bsts package has installed and he executes the code in Code Snippet 3 to verify that
bsts is indeed installed together with the dependent packages. To further confirm that the package exists and functions he can try to load it from SSMS:
Code Snippet 8: Loading the bsts Library
Executing the code in Code Snippet 8 results in:
Figure 7: Loading bsts
From Figure 7 it seems that everything has worked, sweet!
That is cool, no?! Well, there is one drawback with this: Dane has to have admin rights on the SQL Server box and, no offense Dane, but who in their right minds would give Dane those rights on a production SQL Server box!
Jokes aside, using an R packet manager may be too inconvenient, e.g. anytime a developer want to install packages, someone with admin rights on the box needs to install said packages. In coming posts we look at other options for installing packages.
In this post we covered:
- When you install packages sometimes they require compilation. For that, Rtools should be on the box where SQL Server ML Services lives.
- There are multiple ways we can install packages:
- R packet managers.
- An R packet manager is an R command line tool or GUI installed on the SQL Server Machine Learning Services machine that can run with elevated permissions and target the R engine for the instance on which you want to install the package.
- SQL Server ML Services ships with two R packet manager:
- These two packet managers lives in
- When using a packet manager to install a package, you run the packet manager from an elevated command prompt.
- You can use the R command
install.packagesto install a package from the package manager.
If you have comments, questions etc., please comment on this post or ping me.
Share this Post:Twitter | Google+ | LinkedIn