Introduction to R
Copyright (c) 2013-2021 Aurelien Ginolhac, UL HPC Team <firstname.lastname@example.org>
Through this tutorial you will learn how to use R from your local machine or from one of the UL HPC platform clusters. Then, we will see how to organize and group data. Finally we will illustrate how R can benefit from multicore and cluster parallelization.
Warning: this tutorial does not focus on the learning of R language but aims at showing you nice start-up tips. If you’re also looking for a good tutorial on R’s data structures you can take a look at: Hadley Wickham’s page. Another bookdown’s book is available for free: R for Data Science by Garrett Grolemund & Hadley Wickham
Ensure you are able to connect to the UL HPC clusters
you MUST work on a computing node
# /!\ FOR ALL YOUR COMPILING BUSINESS, ENSURE YOU WORK ON A COMPUTING NODE (access-iris)$> si -c 2 -t 1:00:00
Optional: On your local machine
First of all, let’s install R. You will find releases for various distributions available at CRAN Archive.
You will also find handy to use the R-Studio graphical IDE.
On HPC, available as a module
module load lang/R
We also need
pandoc, supplied in RStudio but not on HPC.
We fetch the binary and copied in our own
~/bin that should be
included in your
wget -qO- https://github.com/jgm/pandoc/releases/download/2.16.1/pandoc-2.16.1-linux-amd64.tar.gz | tar xfz - mkdir -p ~/bin/ cp pandoc-2.16.1/bin/pandoc ~/bin/
If you haven’t cloned the
tutorialsrepository, do it with
git clone https://github.com/ULHPC/tutorials.git
In your cloned
Open a R session
jdoe@localhost:~$ Ror create a New Project in RStudio
Enclosed R packages environment
We will use
to synchronize the practical packages with yours. Having a dedicated
library project allows to retrieve a certain specific set without
messing other projects
Of note, the
.Rprofile is detected when
R is started in the
tutorials/maths/R folder and should see automatic boostraping of
# Bootstrapping renv 0.14.0 -------------------------------------------------- * Downloading renv 0.14.0 ... OK (downloaded source) * Installing renv 0.14.0 ... Done! * Successfully installed and loaded renv 0.14.0. * Project '/mnt/lscratch/users/aginolhac/tutorials/maths/R' loaded. [renv 0.14.0] * The project library is out of sync with the lockfile. * Use `renv::restore()` to install packages recorded in the lockfile. Warning message: Project requested R version '4.1.0' but '4.0.5' is currently being used
Then, restore the package list locally:
- Restore the packages set for this practical
The list is prompted:
[...] # GitHub ============================= - tarchetypes [* -> ropensci/tarchetypes@main] - targets [* -> ropensci/targets@main] Do you want to proceed? [y/N]:
y and wait (takes 5 to 10 minutes)
The packages will be copied in a local cache, shared across projects. So the same package installed in another folder is only linked, saving a lot of time while preserving the enclosed packages environments.
It takes time but who is really using R with
targetsfor the datasauRus example
Either knitr in RStudio the file
datasauRus.Rmd or run
rmarkdown::render("datasauRus.Rmd") in a console.
targetsfor the gapminder example
Either knitr in RStudio the file
gapminder.Rmd or run
rmarkdown::render("gapminder.Rmd") in a console.
gapminder.htmldocument both the pipeline description and run
report_gap.htmlwas dynamically rendered
7bis. Try to set the multi-process on using
Un-comment the lines
library(future) #library(future.callr) #plan(callr)
tar_make_future(workers = 2)
gapminder.Rmd, all targets are skipped.
Modify one value in
gapminder.tsvthe raw data and Re-knit
Change the linear regression for
lifeExp ~ year1950 + gdpPercapand Re-knit