Difference between revisions of "Software:R"

From CAC Wiki
Jump to: navigation, search
(Created page with "== python == {| style="border-spacing: 8px;" | valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" | Python is...")
 
 
(13 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== python ==
+
__TOC__
{|  style="border-spacing: 8px;"
+
 
| valign="top" width="50%" style="padding:1em; border:1px solid #aaaaaa; background-color:#e1eaf1; border-radius:7px" |
+
== Introduction ==
Python is an interpreted, interactive, object-oriented programming language. It incorporates modules, exceptions, dynamic typing, very high level dynamic data types, and classes. Python combines remarkable power with very clear syntax. It has interfaces to many system calls and libraries, as well as to various window systems, and is extensible in C or C++. It is also usable as an extension language for applications that need a programmable interface. Finally, Python is portable: it runs on many Unix variants, on the Mac, and on Windows 2000 and later.
+
R is a language and environment for statistical computing and graphics. It provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. Although R is not well-suited for HPC cluster, functionality and flexibility for data handling and analysis make it a essential tool for researcher from wide range of scientific discipline. In addition, R provide a large repository of user built packages suits for different range of applications.
* '''Current Version''': 2.6.6
+
 
* '''Location''': /usr/bin
+
You can run your R script from our Rstudio or through a batch script on compute nodes. [https://login.cac.queensu.ca/rstudio/ Rstudio] provides our user with an IDE to develop and run your program on a web browser. But it is constrained in term of the available packages and the compute resource. Alternative, you can run the program through command line /batch script which provides the user the flexibility to choose the different version of R available on our central software stack and their respective packages. This pages focus on primer for running Rscript on compute nodes.
* '''Alternative Versions''': 2.7.11 ("use anaconda2"), 3.4.3 ("use anaconda3")
+
 
* '''Documentation Files''': [https://docs.python.org/3/index.html Online documentation] at the [https://python.org/ Python Website].
+
== Starting R console ==
|}
+
Similar to any software stack on the cluster, based on the requirement one needs to load a specific version of R using the 'module load' command. You can load default version using
* '''Other Help File''': [[Help:Compilers|Compiler help file]]
+
<pre>
* '''Related Software''':  
+
$ module load r
** [[Software:ics|Intel Compiler Suite]]
+
</pre>
** [[Software:gcc|Gnu Compilers]]
+
 
 +
Use module spider command to list all the different version and their dependencies.
 +
<pre>
 +
$ module spider r
 +
</pre>
 +
 
 +
Several version of R are available as module(r/3.3.3 to 4.1.2). You need to load all the dependent modules before load a specif r version and can be determine using module spider command
 +
$ module spider r/4.1.2
 +
 
 +
    You will need to load all module(s) on any one of the lines below before the "r/4.1.2" module is available to load.
 +
      StdEnv/2020
 +
 
 +
Now you can load r/4.1.2 using the following module load commands
 +
<pre>
 +
$ module load StdEnv/2020
 +
$ module load r/4.1.2
 +
</pre>
 +
 
 +
Similarly, if you want to load r/3.6.1
 +
<pre>
 +
$ module spider r/3.6.1
 +
$ module load nixpkgs/16.09  gcc/7.3.0
 +
$ module load r/3.6.1
 +
</pre>
 +
Typically, one needs to load a corresponding gcc compiler before loading R. Note: StdEnv/2020 modules contains gcc/9.3.0 and is sufficient to load r/4.1.2
 +
 
 +
== Installing R packages ==
 +
 
 +
Each R modules comes with base R packages and might not have all the required packages for your project. You can additional packages from CRAN using the install.packages() command. By default, R tries to install this packages in the folder where R was install, $EBROOTR. Since users don't have the write permission to modify this folder in /cvmfs software stack, you will be prompted to install the R packages in your local directory. Please type "yes" and type enter.
 +
 
 +
Several R packages use GNU compilers during the installation, so it generally recommend load gcc module along with the R module.
 +
<pre>
 +
$ module load StdEnv/2020
 +
$ module load gcc/9.3.0 r/4.0.2
 +
$ R
 +
> install.packages('sp', repos='https://cloud.r-project.org/')
 +
Warning in install.packages("sp", repos = "https://cloud.r-project.org/") :
 +
  'lib = "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/r/4.0.2/lib64/R/library"' is not writable
 +
Would you like to use a personal library instead? (yes/No/cancel) yes
 +
Would you like to create a personal library
 +
‘~/R/x86_64-pc-linux-gnu-library/4.0’
 +
to install packages into? (yes/No/cancel) yes
 +
...
 +
>
 +
</pre>
 +
 
 +
Note: R packages installed in the ~/R/x86_64-pc-linux-gnu-library/ will be available for that particular version of R. For this example, sp pacakges is installed in ~/R/x86_64-pc-linux-gnu-library/4.0 and is available for 4.0 versions of R. If you want to use this package with another,say 3.6, it needs to be installed again.
 +
<pre>
 +
$ module load r/3.6
 +
$ R
 +
> install.packages('sp', repos='https://cloud.r-project.org/')
 +
</pre>

Latest revision as of 18:58, 30 October 2023

Introduction

R is a language and environment for statistical computing and graphics. It provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. Although R is not well-suited for HPC cluster, functionality and flexibility for data handling and analysis make it a essential tool for researcher from wide range of scientific discipline. In addition, R provide a large repository of user built packages suits for different range of applications.

You can run your R script from our Rstudio or through a batch script on compute nodes. Rstudio provides our user with an IDE to develop and run your program on a web browser. But it is constrained in term of the available packages and the compute resource. Alternative, you can run the program through command line /batch script which provides the user the flexibility to choose the different version of R available on our central software stack and their respective packages. This pages focus on primer for running Rscript on compute nodes.

Starting R console

Similar to any software stack on the cluster, based on the requirement one needs to load a specific version of R using the 'module load' command. You can load default version using

 $ module load r

Use module spider command to list all the different version and their dependencies.

$ module spider r

Several version of R are available as module(r/3.3.3 to 4.1.2). You need to load all the dependent modules before load a specif r version and can be determine using module spider command $ module spider r/4.1.2

   You will need to load all module(s) on any one of the lines below before the "r/4.1.2" module is available to load.
     StdEnv/2020

Now you can load r/4.1.2 using the following module load commands

$ module load StdEnv/2020
$ module load r/4.1.2

Similarly, if you want to load r/3.6.1

$ module spider r/3.6.1
$ module load  nixpkgs/16.09  gcc/7.3.0
$ module load r/3.6.1

Typically, one needs to load a corresponding gcc compiler before loading R. Note: StdEnv/2020 modules contains gcc/9.3.0 and is sufficient to load r/4.1.2

Installing R packages

Each R modules comes with base R packages and might not have all the required packages for your project. You can additional packages from CRAN using the install.packages() command. By default, R tries to install this packages in the folder where R was install, $EBROOTR. Since users don't have the write permission to modify this folder in /cvmfs software stack, you will be prompted to install the R packages in your local directory. Please type "yes" and type enter.

Several R packages use GNU compilers during the installation, so it generally recommend load gcc module along with the R module.

$ module load StdEnv/2020
$ module load gcc/9.3.0 r/4.0.2
$ R
> install.packages('sp', repos='https://cloud.r-project.org/')
Warning in install.packages("sp", repos = "https://cloud.r-project.org/") :
  'lib = "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/r/4.0.2/lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘~/R/x86_64-pc-linux-gnu-library/4.0’
to install packages into? (yes/No/cancel) yes
...
>

Note: R packages installed in the ~/R/x86_64-pc-linux-gnu-library/ will be available for that particular version of R. For this example, sp pacakges is installed in ~/R/x86_64-pc-linux-gnu-library/4.0 and is available for 4.0 versions of R. If you want to use this package with another,say 3.6, it needs to be installed again.

$ module load r/3.6
$ R
> install.packages('sp', repos='https://cloud.r-project.org/')