Introduction to R

By
ULHPC Licence GitHub
issues Github Documentation
Status GitHub
forks


R Tutorial

  Copyright (c) 2013-2021 Aurelien Ginolhac, UL HPC Team  <hpc-sysadmins@uni.lu>

Through this tutorial you will learn how to use R from your local machine or from one of the UL HPC platform clusters. Then, we will see how to organize and group data. Finally we will illustrate how R can benefit from multicore and cluster parallelization.

Warning: this tutorial does not focus on the learning of R language but aims at showing you nice start-up tips. If you’re also looking for a good tutorial on R’s data structures you can take a look at: Hadley Wickham’s page. Another bookdown’s book is available for free: R for Data Science by Garrett Grolemund & Hadley Wickham


Pre-requisites

Ensure you are able to connect to the UL HPC clusters

you MUST work on a computing node

# /!\ FOR ALL YOUR COMPILING BUSINESS, ENSURE YOU WORK ON A COMPUTING NODE
(access-iris)$> si -c 2 -t 1:00:00

Optional: On your local machine

First of all, let’s install R. You will find releases for various distributions available at CRAN Archive.

You will also find handy to use the R-Studio graphical IDE.

On HPC, available as a module

module load lang/R

We also need pandoc, supplied in RStudio but not on HPC.

We fetch the binary and copied in our own ~/bin that should be included in your PATH.

wget -qO- https://github.com/jgm/pandoc/releases/download/2.16.1/pandoc-2.16.1-linux-amd64.tar.gz | tar xfz - 
mkdir -p ~/bin/
cp pandoc-2.16.1/bin/pandoc ~/bin/

Cloning

  1. If you haven’t cloned the tutorials repository, do it with git clone https://github.com/ULHPC/tutorials.git

  2. In your cloned tutorials repository, cd tutorials/maths/R

  3. Open a R session jdoe@localhost:~$ R or create a New Project in RStudio

Enclosed R packages environment

We will use renv to synchronize the practical packages with yours. Having a dedicated library project allows to retrieve a certain specific set without messing other projects

Of note, the .Rprofile is detected when R is started in the tutorials/maths/R folder and should see automatic boostraping of renv:

# Bootstrapping renv 0.14.0 --------------------------------------------------
* Downloading renv 0.14.0 ... OK (downloaded source)
* Installing renv 0.14.0 ... Done!
* Successfully installed and loaded renv 0.14.0.
* Project '/mnt/lscratch/users/aginolhac/tutorials/maths/R' loaded. [renv 0.14.0]
* The project library is out of sync with the lockfile.
* Use `renv::restore()` to install packages recorded in the lockfile.
Warning message:
Project requested R version '4.1.0' but '4.0.5' is currently being used

Otherwise:

  1. Install renv
install.packages("renv")

Then, restore the package list locally:

  1. Restore the packages set for this practical
renv::restore()

The list is prompted:

[...]
# GitHub =============================
- tarchetypes      [* -> ropensci/tarchetypes@main]
- targets          [* -> ropensci/targets@main]

Do you want to proceed? [y/N]: 

Enter y and wait (takes 5 to 10 minutes)

The packages will be copied in a local cache, shared across projects. So the same package installed in another folder is only linked, saving a lot of time while preserving the enclosed packages environments.

It takes time but who is really using R with base only?

  1. Run targets for the datasauRus example

Either knitr in RStudio the file datasauRus.Rmd or run rmarkdown::render("datasauRus.Rmd") in a console.

  1. Run targets for the gapminder example

Either knitr in RStudio the file gapminder.Rmd or run rmarkdown::render("gapminder.Rmd") in a console.

  • gapminder.html document both the pipeline description and run
  • report_gap.html was dynamically rendered

7bis. Try to set the multi-process on using future

Un-comment the lines

library(future)
#library(future.callr)
#plan(callr)

and replace tar_make() by

tar_make_future(workers = 2)
  1. Re-Knit gapminder.Rmd, all targets are skipped.

  2. Modify one value in gapminder.tsv the raw data and Re-knit gapminder.Rmd

  3. Change the linear regression for lifeExp ~ year1950 + gdpPercap and Re-knit gapminder.Rmd