Are you using R but wondering if your R code makes the best use of the computing resources available? Would you like to learn to speed up R analyses by parallel computing, identify bottlenecks in your R scripts, or get tips on handling large datasets in R? Join our course that focuses on using R efficiently and making most of R in a high performance computing environment.
Topics of the course include
- making use of the properties of R as a programming language to write efficient R code
- exploring performance issues of R code by benchmarking and profiling processes and memory usage
- parallel and distributed computing with R on both local and supercomputing resources
The topics will be covered using short lectures and/or demonstrations followed by hands-on exercises using RStudio and batch jobs on the supercomputer Puhti.
Advanced participants are welcome to bring their own R code (short script section you are already familiar with) and a small data set (maximum 5 GB) to go with the script to be used in the some of the exercises. We expect that you are able to read the data into R and run the script independently.
Target audience:
This course is meant for anyone familiar with the basics of R and wanting to learn how to make their analyses in R more efficient and how to use R in a high performance computing environment. For example:
- current users of RStudio in CSC’s Puhti web interface: move beyond RStudio and make most of the computing resources of the supercomputer
- R users running R on their own computer so far: use your computer’s resources efficiently and learn to use R in a high performance computing environment
- experienced users of another programming language and/or high performance computing: get familiar with the functional nature of the R language and its resource management
Where & when:
This is a two-day course from 9:00 to 16:00. The course will be offered on-site at the CSC Training Facilities (Keilaranta 14, Espoo, Finland) and online. For the best experience and if you anticipate needing a lot of support with the course exercises (see the pre-requisites below), we recommend on-site participation. For participants joining the course on site in Espoo, lunch and a snack are included in the price.
Learning outcomes:
After attending this course, participants will be able to:
- explore potential R code performance issues with benchmarking and profiling'
- understand the key properties of the R language and how they relate to the computer’s resource management
- run R scripts with the batch job system on the supercomputer Puhti
- get started with parallel and distributed computing with R
Pre-requisites:
Required:
- basics of R
‘Basics of R’ can mean many things, but if the following things are familiar to you, you should be good to go: basic R syntax, running and writing R code, data structures and types, using packages, using variables, functions and pipes, and basic operations for data wrangling. If you know how to write a for loop or a function in R, you will definitely be fine!
If you are a complete beginner with R and programming in general, we recommend the course Data Analysis with R instead.
Useful to make the most of the course content but not required:
- basics of Linux (for example: https://csc-training.github.io/csc-env-eff/hands-on/linux_prerequisites/basic-linux-commands.html)
- some experience in using a supercomputer, for example using the RStudio in the Puhti web interface (puhti.csc.fi), or the course CSC Computing Environment, Part 1: Basics or the corresponding self-learning materials (https://csc-training.github.io/csc-env-eff/).
Lecturers:
Heli Juottonen and Maciej Janicki (CSC)
Registration deadline: 9.11.2025