Description:
Are you using R but not sure if your R code makes the best use of the computing resources available? Would you like to learn to speed up R analyses by parallel computing, identify bottlenecks in your R scripts, or get tips on handling large datasets in R? Join our new course that focuses on using R efficiently and making most of R in a high performance computing environment.
The topics of this course include:
- making use of the properties of R as a programming language to write efficient R code
- exploring performance issues of R code by benchmarking and profiling processes and memory usage
- parallel and distributed computing with R on both local and supercomputing resources
The topics will be covered using short lectures and/or demonstrations followed by hands-on exercises using RStudio and batch jobs on the supercomputer Puhti. The participants are welcome to bring their own R code (short script sections, not full projects) and a small data set (maximum 5 GB) to be used in the some of the exercises (but note that we do not solve any problems with the code itself).
Target audience:
This course is meant for anyone familiar with the basics of R and wanting to learn how to make their analyses in R more efficient and how to use R in a high performance computing environment. For example:
- current users of RStudio in CSC’s Puhti web interface: move beyond RStudio and make most of the computing resources of the supercomputer
- R users running R on their own computer so far: use your computer’s resources efficiently and learn to use R in a high performance computing environment
- experienced users of another programming language and/or high performance computing: get familiar with the functional nature of the R language and its resource management
Where & when:
This is a two-day course from 9:00 to 16:00. The course will be offered on-site at the CSC Training Facilities (Keilaranta 14, Espoo, Finland). A Zoom link can be provided to participants not able to join on-site, but please note that this is not a hybrid course so online participants will be offered limited support. For participants joining the course on site in Espoo, lunch and a snack is included in the price.
Learning outcomes:
After attending this course, participants will be able to:
- explore potential R code performance issues with benchmarking and profiling
- understand the key properties of the R language and how they relate to the computer’s resource management
- run R scripts with the batch job system on the supercomputer Puhti
- get started with parallel and distributed computing with R
Pre-requisites:
Required:
- basics of the R programming language
- if you are a complete beginner with R and programming in general, we recommend the course Data Analysis with R instead
Useful to make the most of the course content but not required:
-
basics of Linux (for example: https://csc-training.github.io/csc-env-eff/part-1/prerequisites/)
-
some experience in using the supercomputer Puhti, for example using the RStudio in the Puhti web interface (puhti.csc.fi), or the course CSC Computing Environment, Part 1: Basics or the corresponding self-learning materials (https://csc-training.github.io/csc-env-eff/).
Lecturers:
Billy Braithwaite and Heli Juottonen (CSC)