Modern HPC systems combine CPUs and accelerators such as GPUs or FPGAs, making code optimization for diverse platforms time-consuming. Cross-platform portability ecosystems provide a higher-level abstraction layer, simplifying parallel programming in shared memory environments. For C++, examples include SYCL, Kokkos, and standard C++.
- SYCL, an open standard by Khronos Group, offers a unified C++ layer for diverse devices, achieving parallel execution on CPUs, GPUs, FPGAs, and more.
- Kokkos Core, a C++ framework, enables high-performance applications across HPC platforms, addressing challenges of intricate node architectures. Kokkos supports various backend programming models like CUDA, HIP, SYCL, HPX, OpenMP, and C++ threads.
- The latest C++ standard enables some GPU programming.
This training introduces GPU programming using SYCL , Kokkos, standard C++ to write portable and performant accelerated applications. The course consists of lectures and hands-on sessions using LUMI, and Mahti featuring AMD, and Nvidia GPUs. At the end of the training, we also provide opportunity for the participants to apply the acquired knowledge to personal coding projects and real-world application scenarios.
Where and when:
Wednesday 27th – Friday 29th November
This is an on-premise event at the CSC Training Facilities located on the premises of CSC at Keilaranta 14, Espoo, Finland.
Learning outcome:
At the end of this training, participants will be able to:
- write hardware-agnostic code to express parallelism using SYCL, standard C++ and Kokkos that can run on CPUs and GPUs
- manage memory across devices
- do basic performance analysis
- evaluate the drawbacks between different approaches for programming GPUs
Prerequisites:
This course targets developers who know C++ and would like to learn how to program GPUs or for developers who are already doing GPU programming using a non-portable approach such like CUDA or HIP and would like to write performant code which runs on various computing platforms. In order to be able to follow the course the participants are expected to have basic familiarity with C++ concepts such as raw pointers, classes, structures, templates, lambdas, functors.
The content level of the course is broken down as: beginner’s – 70%, intermediate – 20%, advanced – 10%, community-targeted content – 0%.
Program (coarse grained):
Day 1, Wednesday 27.11, 9:00-17:00
09:00-11:00 Introduction to GPUs and GPU parallel programming model
11:00-12:00 Refresher of C++ concepts
12:00-13:00 Lunch break
13:00-16:45 SYCL I
16:45-17:00 Day 1 wrap-up
Day 2, Thursday 28.11, 9:00-17:00
09:00-12:00 SYCL II
12:00-13:00 Lunch break
13:00-15:00 SYCL III
13:00-16:45 Mahti and LUMI (SYCL installation, usage and exercises)
16:45-17:00 Day 2 wrap-up
Day 3 Friday 29.11, 9:00-17:00
09:00-10:00 Interoperability with third-party libraries, and multi-gpu, multi-node programming
10:00-12:00 Kokkos
12:00-13:00 Lunch break
13:00-16:45 Exercises & Bring your own code
16:45-17:00 Day 3 wrap-up & Course closing
The previous training material can be viewed at https://github.com/csc-training/high-level-gpu-programming/releases/tag/hlgp-feb-2024