CRC Wiki
CRC Wiki
Log in

R

From CRC Wiki
RLogo.png

General Description

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

Basic Usage

Adding Packages

Due to the wide range of packages available for R, we are unable to install every one. Fortunately, it is easy for users to install additional libraries. To begin with, load the R module. If the default version is sufficient, this can be done with the command:

 module load R

For convenience, although not necessary, we suggest making a central directory to hold this and any future packages:

 mkdir ~/myRlibs

Next, there are two different ways of installing R packages:

Installing R packages within R

Open an R shell and execute the following command:

 install.packages("package_name", lib="install_location", repos="mirror_location")
 library('package_name', lib.loc='install_location')

For our example, this would be:

 install.packages("bizdays", lib="~/myRlibs",repos='http://cran.us.r-project.org')
 library('bizdays', lib='~/myRlibs')

To avoid having to specify the installation location every time you use this library, you can create an .Renviron file in our home directory using any text editor. Then, add the following line to it:

 R_LIBS=install_location

For our example, this would be:

 R_LIBS=~/myRlibs

Now, we can simply do:

 install.packages("bizdays")
 library(bizdays)

Installing R packages using CRAN

You will need to obtain the source code for the package you want to install. The most common source of these are at The Comprehensive R Archive Network (CRAN). The easist method to get the package to the CRC is to copy the location of the file, usually through a right click sub-menu, and then use the wget command:

 wget https://cran.r-project.org/src/contrib/bizdays_1.0.1.tar.gz

Once we have the package, we will need to decide where to install it.

Now, issue the following command to install the package:

 R CMD INSTALL -l install_location package_name

For our example, this would be:

 R CMD INSTALL -l ~/myRlibs bizdays_1.0.1.tar.gz

The last step is to tell R the location of our new installation. In a CSH environment, this is:

 setenv R_LIBS install_location

If you are using BASH, it would be:

 export R_LIBS=install_location

Add this command to your .cshrc or .bashrc file, respectively, to permanently set it.

Profiling R Code

Profiling R code can help determine which sections in the R code need to be optimized for better performance. In order to profile the R code, one needs to use the Rprof() function. Rprof() records how many seconds have been spent on each function of the R code. The functions that get timed are the ones that get executed after the Rprof() function gets declared. Any function before the Rprof() declaration will not be timed. One needs to pass a parameter to Rprof. The parameter is the name of the file that will contain the results.

If only a section of the R code needs to be profiled, one can use the Rprof() to specify when to start profiling the functions and when to stop profiling the functions. To start profiling the functions, one should place Rprof("file_name") before the functions that need to be profiled get executed. In order to stop profiling the rest of the R Code, one needs to place Rprof(NULL) to stop profiling the rest of the R Code that does not need to be profiled. The following is an example on how RProf() is used in an actual R script.

# load sources
dyn.load("readbfile3_crc.so")
source("readbfile.r")
source("snpsel12_data.r")

Rprof("test1b.out")             #Begin profiling functions

# try to read in data
dat.M <- read.bfile("hapmap_sim_chr1_test.bed")

# try to run snpsel
selmat.M <- snp_sel(dat.M,k=300,b=10,t=.1)

Rprof(NULL)                    #Stop profiling functions

# write selmat for reference
write.table(selmat.M,file="test_selmat_v1.txt",quote=F,sep=" ",col.names=F,row.names=F)

In the example above, the functions read.bfile() and snp_sel() as well a the functions within these functions will be profiled. The function write.table() will not be profiled by Rprof().


Parallel Computing in R

R itself does not provide parallel execution. Therefore, in order to realize parallel computing in R, an appropriate parallel R package should be invoked. For example, R packages for multiple node distributed computing are Rmpi, snow, snowfall, papply, etc. and rparallel, fork, are for single node parallelism.


Test for Rmpi

Here is an Rmpi test file:

# Load the Rmpi pacakge:

library(doParallel)
library(Rmpi)

# Tell all slaves to return a message identifying themselves
mpi.remote.exec(paste(Sys.info()[c("nodename")],"checking in as",mpi.comm.rank(),"of",mpi.comm.size()))

# Tell all slaves to close down, and exit the program
mpi.close.Rslaves()
mpi.quit()

and save this with "Rmpi-test-on-CRC.R". A job script file for this Rmpi parallel test on the CRC Grid Engine:

#!/bin/tcsh
#
#$ -M Your_NetID@nd.edu
#$ -m abe
#
#$ -pe mpi-8 32
#
module load R/3.2.1-gcc

mpirun -np $NSLOTS R --no-save -q < Rmpi-test-on-CRC.R


R in CRC supports "parallel" package. For example, you can invoke the library with:

>library(parallel)

in your R script and then can specify a number of core you want. For example,

>options(cores = 12)
>getOption('cores')

Here is a typical example to compare single-core and multi-core parallel computing in R:

module load R

R

> library(parallel)
> detectCores()
[1] 24
> options(cores = 12)
> getOption('cores')
[1] 12

> test <- lapply(1:10,function(x) rnorm(100000))
 
> system.time(x <- lapply(test,function(x) loess.smooth(x,x)))      <<<== single-core running

> system.time(x <- mclapply(test,function(x) loess.smooth(x,x)))    <<<== multi-core (12-core) running     

Related Software

The following list contains software that is related to R.

Further Information

See the official website: R