I part “Introduction to high-performance computing technology CUDA”
This course covers the theoretical and practical principles of massively parallel approach to high-performance computing using multiprocessing systems and/or combination of GPU hardware and specialised software environment. The seminar gives an overview of the types of high-performance computing hardware and software architecture, computing algorithms, application libraries and tools. More attention is paid to the applied interdisciplinary use of GPU-based parallel computing platform CUDA, e.g., analysis of large data amount, image processing, and machine learning tasks. Along with theoretical information, there is also a possibility to acquire basic skills in developing IT solutions using CUDA.
Day 1
CUDA architecture (30 min):
- History of GPU development
- Types of GPU architecture supported by CUDA
CUDA programming (30 min):
- CUDA programming model
- Basic principles of CUDA programming
- Concepts of threads and blocks
- GPU and CPU data exchange
Parallel algorithms in CUDA environment (30 min):
- Parallel reduction
- Sum of prefixes
Practical task: exercises in developing simple CUDA programs (2 h)
Day 2
CUDA memory hierarchy (30 min):
- Memory levels
- Register file, constant memory
- Global memory
- Shared memory
- Texture memory
- Unified memory
CUDA libraries (20 min):
- CUBLAS
- CURAND
- CUFFT
- CUSPARSE
CUDA interaction with computer graphics (20 min):
- OpenGL interoperability
- Image filter
CUDA application in machine learning (20 min):
- Deep Neural Network Library cuDNN
- Machine learning library TensorFlow
Practical task: exercises in using CUDA development tools (2 h):
- data processing using CUBLAS and CURAND
- image processing in GPU environment
II part “Applied use of the high-performance computing technology CUDA”
This course continues the previous “Introduction to high-performance computing technology CUDA” by focusing on CUDA implementation on multiprocessor graphical systems and CUDA cloud computing possibilities in a remote server environment, in particular. At the end, CUDA practical exercises will be offered on the RTU HPC cluster.
Day 1
Efficient use of CUDA memory (30 min):
- Textures, arrays and possibilities of using them
- Sharing of CUDA memory
- CUDA unified memory
CUDA streams and events (30 min):
- Streams and events for parallel execution
- Asynchronous data copying
- Parallel core execution and data exchange
Debugging and profiling of CUDA software (30 min):
- Performance evaluation and metrics
- Overview of Nvidia Nsight tool for debugging and profiling
- Debugging of CUDA software
- Profiling of CUDA performance
Practical task: exercises in developing applied CUDA programs (2 h):
- Interaction among Nbodies
- Interactive fluid simulation
- Implementation of normalised correlation algorithm in image processing
Day 2
CUDA graphics in multiprocessor systems (30 min):
- Data exchange between GPUs
- Synchronisation of execution
CUDA in a remote server environment (1 h):
- CUDA as a cloud computing service
- Operation principles and architecture of HPC cluster
- Parallel execution of CUDA jobs on the RTU HPC cluster
Practical task: exercises in developing applied CUDA programs (2 h):
- CUDA exercises on the RTU HPC cluster
- Photorealistic rendering in CUDA environment