Difference between revisions of "Software:R"
(6 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
+ | __TOC__ | ||
+ | |||
== Introduction == | == Introduction == | ||
R is a language and environment for statistical computing and graphics. It provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. Although R is not well-suited for HPC cluster, functionality and flexibility for data handling and analysis make it a essential tool for researcher from wide range of scientific discipline. In addition, R provide a large repository of user built packages suits for different range of applications. | R is a language and environment for statistical computing and graphics. It provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. Although R is not well-suited for HPC cluster, functionality and flexibility for data handling and analysis make it a essential tool for researcher from wide range of scientific discipline. In addition, R provide a large repository of user built packages suits for different range of applications. | ||
Line 5: | Line 7: | ||
== Starting R console == | == Starting R console == | ||
+ | Similar to any software stack on the cluster, based on the requirement one needs to load a specific version of R using the 'module load' command. You can load default version using | ||
+ | <pre> | ||
+ | $ module load r | ||
+ | </pre> | ||
+ | |||
+ | Use module spider command to list all the different version and their dependencies. | ||
+ | <pre> | ||
+ | $ module spider r | ||
+ | </pre> | ||
+ | |||
+ | Several version of R are available as module(r/3.3.3 to 4.1.2). You need to load all the dependent modules before load a specif r version and can be determine using module spider command | ||
+ | $ module spider r/4.1.2 | ||
+ | |||
+ | You will need to load all module(s) on any one of the lines below before the "r/4.1.2" module is available to load. | ||
+ | StdEnv/2020 | ||
+ | |||
+ | Now you can load r/4.1.2 using the following module load commands | ||
+ | <pre> | ||
+ | $ module load StdEnv/2020 | ||
+ | $ module load r/4.1.2 | ||
+ | </pre> | ||
+ | |||
+ | Similarly, if you want to load r/3.6.1 | ||
+ | <pre> | ||
+ | $ module spider r/3.6.1 | ||
+ | $ module load nixpkgs/16.09 gcc/7.3.0 | ||
+ | $ module load r/3.6.1 | ||
+ | </pre> | ||
+ | Typically, one needs to load a corresponding gcc compiler before loading R. Note: StdEnv/2020 modules contains gcc/9.3.0 and is sufficient to load r/4.1.2 | ||
== Installing R packages == | == Installing R packages == | ||
+ | |||
+ | Each R modules comes with base R packages and might not have all the required packages for your project. You can additional packages from CRAN using the install.packages() command. By default, R tries to install this packages in the folder where R was install, $EBROOTR. Since users don't have the write permission to modify this folder in /cvmfs software stack, you will be prompted to install the R packages in your local directory. Please type "yes" and type enter. | ||
+ | |||
+ | Several R packages use GNU compilers during the installation, so it generally recommend load gcc module along with the R module. | ||
+ | <pre> | ||
+ | $ module load StdEnv/2020 | ||
+ | $ module load gcc/9.3.0 r/4.0.2 | ||
+ | $ R | ||
+ | > install.packages('sp', repos='https://cloud.r-project.org/') | ||
+ | Warning in install.packages("sp", repos = "https://cloud.r-project.org/") : | ||
+ | 'lib = "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/r/4.0.2/lib64/R/library"' is not writable | ||
+ | Would you like to use a personal library instead? (yes/No/cancel) yes | ||
+ | Would you like to create a personal library | ||
+ | ‘~/R/x86_64-pc-linux-gnu-library/4.0’ | ||
+ | to install packages into? (yes/No/cancel) yes | ||
+ | ... | ||
+ | > | ||
+ | </pre> | ||
+ | |||
+ | Note: R packages installed in the ~/R/x86_64-pc-linux-gnu-library/ will be available for that particular version of R. For this example, sp pacakges is installed in ~/R/x86_64-pc-linux-gnu-library/4.0 and is available for 4.0 versions of R. If you want to use this package with another,say 3.6, it needs to be installed again. | ||
+ | <pre> | ||
+ | $ module load r/3.6 | ||
+ | $ R | ||
+ | > install.packages('sp', repos='https://cloud.r-project.org/') | ||
+ | </pre> |
Latest revision as of 18:58, 30 October 2023
Introduction
R is a language and environment for statistical computing and graphics. It provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. Although R is not well-suited for HPC cluster, functionality and flexibility for data handling and analysis make it a essential tool for researcher from wide range of scientific discipline. In addition, R provide a large repository of user built packages suits for different range of applications.
You can run your R script from our Rstudio or through a batch script on compute nodes. Rstudio provides our user with an IDE to develop and run your program on a web browser. But it is constrained in term of the available packages and the compute resource. Alternative, you can run the program through command line /batch script which provides the user the flexibility to choose the different version of R available on our central software stack and their respective packages. This pages focus on primer for running Rscript on compute nodes.
Starting R console
Similar to any software stack on the cluster, based on the requirement one needs to load a specific version of R using the 'module load' command. You can load default version using
$ module load r
Use module spider command to list all the different version and their dependencies.
$ module spider r
Several version of R are available as module(r/3.3.3 to 4.1.2). You need to load all the dependent modules before load a specif r version and can be determine using module spider command $ module spider r/4.1.2
You will need to load all module(s) on any one of the lines below before the "r/4.1.2" module is available to load. StdEnv/2020
Now you can load r/4.1.2 using the following module load commands
$ module load StdEnv/2020 $ module load r/4.1.2
Similarly, if you want to load r/3.6.1
$ module spider r/3.6.1 $ module load nixpkgs/16.09 gcc/7.3.0 $ module load r/3.6.1
Typically, one needs to load a corresponding gcc compiler before loading R. Note: StdEnv/2020 modules contains gcc/9.3.0 and is sufficient to load r/4.1.2
Installing R packages
Each R modules comes with base R packages and might not have all the required packages for your project. You can additional packages from CRAN using the install.packages() command. By default, R tries to install this packages in the folder where R was install, $EBROOTR. Since users don't have the write permission to modify this folder in /cvmfs software stack, you will be prompted to install the R packages in your local directory. Please type "yes" and type enter.
Several R packages use GNU compilers during the installation, so it generally recommend load gcc module along with the R module.
$ module load StdEnv/2020 $ module load gcc/9.3.0 r/4.0.2 $ R > install.packages('sp', repos='https://cloud.r-project.org/') Warning in install.packages("sp", repos = "https://cloud.r-project.org/") : 'lib = "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/r/4.0.2/lib64/R/library"' is not writable Would you like to use a personal library instead? (yes/No/cancel) yes Would you like to create a personal library ‘~/R/x86_64-pc-linux-gnu-library/4.0’ to install packages into? (yes/No/cancel) yes ... >
Note: R packages installed in the ~/R/x86_64-pc-linux-gnu-library/ will be available for that particular version of R. For this example, sp pacakges is installed in ~/R/x86_64-pc-linux-gnu-library/4.0 and is available for 4.0 versions of R. If you want to use this package with another,say 3.6, it needs to be installed again.
$ module load r/3.6 $ R > install.packages('sp', repos='https://cloud.r-project.org/')