Difference between revisions of "Software:R"

From CAC Wiki
Jump to: navigation, search
 
(12 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== R ==
+
__TOC__
{|  style="border-spacing: 8px;"
+
 
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" |
+
== Introduction ==
R is a language and environment for statistical computing and graphics. It provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. It includes an effective data handling and storage facility, a suite of operators for calculations on arrays, in particular matrices, a large, coherent, integrated collection of intermediate tools for data analysis,
+
R is a language and environment for statistical computing and graphics. It provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. Although R is not well-suited for HPC cluster, functionality and flexibility for data handling and analysis make it a essential tool for researcher from wide range of scientific discipline. In addition, R provide a large repository of user built packages suits for different range of applications.
graphical facilities for data analysis and display either on-screen or on hardcopy, and a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.
+
 
* '''Version''': 3.1.1
+
You can run your R script from our Rstudio or through a batch script on compute nodes. [https://login.cac.queensu.ca/rstudio/ Rstudio] provides our user with an IDE to develop and run your program on a web browser. But it is constrained in term of the available packages and the compute resource. Alternative, you can run the program through command line /batch script which provides the user the flexibility to choose the different version of R available on our central software stack and their respective packages. This pages focus on primer for running Rscript on compute nodes.
* '''Location''': /usr/bin
+
 
* '''Related link''': For more information go to the [http://www.r-project.org/ R project web site]
+
== Starting R console ==
* '''Help File''': [[Help:R|R Help file]]
+
Similar to any software stack on the cluster, based on the requirement one needs to load a specific version of R using the 'module load' command. You can load default version using
|}
+
<pre>
* '''Related Software''':
+
$ module load r
** [[Software:ics|Intel Compiler Suite]]
+
</pre>
** [[Software:gcc|Gnu Compilers]]
+
 
** [[Software:ParaView|ParaView]]
+
Use module spider command to list all the different version and their dependencies.
** [[Software:PETSc|PETSc]]
+
<pre>
** [[Software:Hypre|Hypre]]
+
$ module spider r
** [[Software:Matlab|Matlab]]
+
</pre>
** [[Software:FFTW|FFTW]]
+
 
 +
Several version of R are available as module(r/3.3.3 to 4.1.2). You need to load all the dependent modules before load a specif r version and can be determine using module spider command
 +
$ module spider r/4.1.2
 +
 
 +
    You will need to load all module(s) on any one of the lines below before the "r/4.1.2" module is available to load.
 +
      StdEnv/2020
 +
 
 +
Now you can load r/4.1.2 using the following module load commands
 +
<pre>
 +
$ module load StdEnv/2020
 +
$ module load r/4.1.2
 +
</pre>
 +
 
 +
Similarly, if you want to load r/3.6.1
 +
<pre>
 +
$ module spider r/3.6.1
 +
$ module load nixpkgs/16.09  gcc/7.3.0
 +
$ module load r/3.6.1
 +
</pre>
 +
Typically, one needs to load a corresponding gcc compiler before loading R. Note: StdEnv/2020 modules contains gcc/9.3.0 and is sufficient to load r/4.1.2
 +
 
 +
== Installing R packages ==
 +
 
 +
Each R modules comes with base R packages and might not have all the required packages for your project. You can additional packages from CRAN using the install.packages() command. By default, R tries to install this packages in the folder where R was install, $EBROOTR. Since users don't have the write permission to modify this folder in /cvmfs software stack, you will be prompted to install the R packages in your local directory. Please type "yes" and type enter.
 +
 
 +
Several R packages use GNU compilers during the installation, so it generally recommend load gcc module along with the R module.
 +
<pre>
 +
$ module load StdEnv/2020
 +
$ module load gcc/9.3.0 r/4.0.2
 +
$ R
 +
> install.packages('sp', repos='https://cloud.r-project.org/')
 +
Warning in install.packages("sp", repos = "https://cloud.r-project.org/") :
 +
  'lib = "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/r/4.0.2/lib64/R/library"' is not writable
 +
Would you like to use a personal library instead? (yes/No/cancel) yes
 +
Would you like to create a personal library
 +
‘~/R/x86_64-pc-linux-gnu-library/4.0’
 +
to install packages into? (yes/No/cancel) yes
 +
...
 +
>
 +
</pre>
 +
 
 +
Note: R packages installed in the ~/R/x86_64-pc-linux-gnu-library/ will be available for that particular version of R. For this example, sp pacakges is installed in ~/R/x86_64-pc-linux-gnu-library/4.0 and is available for 4.0 versions of R. If you want to use this package with another,say 3.6, it needs to be installed again.
 +
<pre>
 +
$ module load r/3.6
 +
$ R
 +
> install.packages('sp', repos='https://cloud.r-project.org/')
 +
</pre>

Latest revision as of 18:58, 30 October 2023

Introduction

R is a language and environment for statistical computing and graphics. It provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. Although R is not well-suited for HPC cluster, functionality and flexibility for data handling and analysis make it a essential tool for researcher from wide range of scientific discipline. In addition, R provide a large repository of user built packages suits for different range of applications.

You can run your R script from our Rstudio or through a batch script on compute nodes. Rstudio provides our user with an IDE to develop and run your program on a web browser. But it is constrained in term of the available packages and the compute resource. Alternative, you can run the program through command line /batch script which provides the user the flexibility to choose the different version of R available on our central software stack and their respective packages. This pages focus on primer for running Rscript on compute nodes.

Starting R console

Similar to any software stack on the cluster, based on the requirement one needs to load a specific version of R using the 'module load' command. You can load default version using

 $ module load r

Use module spider command to list all the different version and their dependencies.

$ module spider r

Several version of R are available as module(r/3.3.3 to 4.1.2). You need to load all the dependent modules before load a specif r version and can be determine using module spider command $ module spider r/4.1.2

   You will need to load all module(s) on any one of the lines below before the "r/4.1.2" module is available to load.
     StdEnv/2020

Now you can load r/4.1.2 using the following module load commands

$ module load StdEnv/2020
$ module load r/4.1.2

Similarly, if you want to load r/3.6.1

$ module spider r/3.6.1
$ module load  nixpkgs/16.09  gcc/7.3.0
$ module load r/3.6.1

Typically, one needs to load a corresponding gcc compiler before loading R. Note: StdEnv/2020 modules contains gcc/9.3.0 and is sufficient to load r/4.1.2

Installing R packages

Each R modules comes with base R packages and might not have all the required packages for your project. You can additional packages from CRAN using the install.packages() command. By default, R tries to install this packages in the folder where R was install, $EBROOTR. Since users don't have the write permission to modify this folder in /cvmfs software stack, you will be prompted to install the R packages in your local directory. Please type "yes" and type enter.

Several R packages use GNU compilers during the installation, so it generally recommend load gcc module along with the R module.

$ module load StdEnv/2020
$ module load gcc/9.3.0 r/4.0.2
$ R
> install.packages('sp', repos='https://cloud.r-project.org/')
Warning in install.packages("sp", repos = "https://cloud.r-project.org/") :
  'lib = "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/r/4.0.2/lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘~/R/x86_64-pc-linux-gnu-library/4.0’
to install packages into? (yes/No/cancel) yes
...
>

Note: R packages installed in the ~/R/x86_64-pc-linux-gnu-library/ will be available for that particular version of R. For this example, sp pacakges is installed in ~/R/x86_64-pc-linux-gnu-library/4.0 and is available for 4.0 versions of R. If you want to use this package with another,say 3.6, it needs to be installed again.

$ module load r/3.6
$ R
> install.packages('sp', repos='https://cloud.r-project.org/')