This course covers advanced topics on code optimization for x86 platforms (Intel and AMD CPUs). We discuss different techniques for analyzing and maximizing both single and multi-core performance within a single node. The topics inlude instruction-level parallelism, vectorization, and efficient utilization of cache and memory. The course consists of lectures and hands-on exercises.
Learning outcome
- Awareness of features and internal workings of x86 CPUs
- Ability to analyze and assess single-node performance
- Ability to vectorize computations
- Ability to optimize cache and memory access
Prerequisites
- Good knowledge of C/C++ or Fortran
- Good knowledge of threading using OpenMP
- Basic knowledge of modern CPU architectures
Agenda
Day 1
- Overview about performance engineering
- General overview of modern multicore CPU
- Main memory performance
- Performance analysis tools
Day 2
- Deeper dive into caches
- Detailed look into Intel and AMD CPUs
- Advanced vectorization
- Additional optimization topics
Deadline for registrations 3.5.2024