Introduction to R
R Tutorial
Copyright (c) 2013-2021 Aurelien Ginolhac, UL HPC Team <hpc-sysadmins@uni.lu>
Through this tutorial you will learn how to use R from your local machine or from one of the UL HPC platform clusters. Then, we will see how to organize and group data. Finally we will illustrate how R can benefit from multicore and cluster parallelization.
Warning: this tutorial does not focus on the learning of R language but aims at showing you nice start-up tips. If you’re also looking for a good tutorial on R’s data structures you can take a look at: Hadley Wickham’s page. Another bookdown’s book is available for free: R for Data Science by Garrett Grolemund & Hadley Wickham
Pre-requisites
Ensure you are able to connect to the UL HPC clusters
you MUST work on a computing node
# /!\ FOR ALL YOUR COMPILING BUSINESS, ENSURE YOU WORK ON A COMPUTING NODE
(access-iris)$> si -c 2 -t 1:00:00
Optional: On your local machine
First of all, let’s install R. You will find releases for various distributions available at CRAN Archive.
You will also find handy to use the R-Studio graphical IDE.
On HPC, available as a module
module load lang/R
We also need pandoc
, supplied in RStudio but not on HPC.
We fetch the binary and copied in our own ~/bin
that should be
included in your PATH
.
wget -qO- https://github.com/jgm/pandoc/releases/download/2.16.1/pandoc-2.16.1-linux-amd64.tar.gz | tar xfz -
mkdir -p ~/bin/
cp pandoc-2.16.1/bin/pandoc ~/bin/
Cloning
-
If you haven’t cloned the
tutorials
repository, do it withgit clone https://github.com/ULHPC/tutorials.git
-
In your cloned
tutorials
repository,cd tutorials/maths/R
-
Open a R session
jdoe@localhost:~$ R
or create a New Project in RStudio
Enclosed R packages environment
We will use renv
to synchronize the practical packages with yours. Having a dedicated
library project allows to retrieve a certain specific set without
messing other projects
Of note, the .Rprofile
is detected when R
is started in the
tutorials/maths/R
folder and should see automatic boostraping of
renv
:
# Bootstrapping renv 0.14.0 --------------------------------------------------
* Downloading renv 0.14.0 ... OK (downloaded source)
* Installing renv 0.14.0 ... Done!
* Successfully installed and loaded renv 0.14.0.
* Project '/mnt/lscratch/users/aginolhac/tutorials/maths/R' loaded. [renv 0.14.0]
* The project library is out of sync with the lockfile.
* Use `renv::restore()` to install packages recorded in the lockfile.
Warning message:
Project requested R version '4.1.0' but '4.0.5' is currently being used
Otherwise:
- Install
renv
install.packages("renv")
Then, restore the package list locally:
- Restore the packages set for this practical
renv::restore()
The list is prompted:
[...]
# GitHub =============================
- tarchetypes [* -> ropensci/tarchetypes@main]
- targets [* -> ropensci/targets@main]
Do you want to proceed? [y/N]:
Enter y
and wait (takes 5 to 10 minutes)
The packages will be copied in a local cache, shared across projects. So the same package installed in another folder is only linked, saving a lot of time while preserving the enclosed packages environments.
It takes time but who is really using R with base
only?
- Run
targets
for the datasauRus example
Either knitr in RStudio the file datasauRus.Rmd
or run
rmarkdown::render("datasauRus.Rmd")
in a console.
- Run
targets
for the gapminder example
Either knitr in RStudio the file gapminder.Rmd
or run
rmarkdown::render("gapminder.Rmd")
in a console.
gapminder.html
document both the pipeline description and runreport_gap.html
was dynamically rendered
7bis. Try to set the multi-process on using future
Un-comment the lines
library(future)
#library(future.callr)
#plan(callr)
and replace tar_make()
by
tar_make_future(workers = 2)
-
Re-Knit
gapminder.Rmd
, all targets are skipped. -
Modify one value in
gapminder.tsv
the raw data and Re-knitgapminder.Rmd
-
Change the linear regression for
lifeExp ~ year1950 + gdpPercap
and Re-knitgapminder.Rmd