I part “Introduction to high-performance computing technology CUDA”

This course covers the theoretical and practical principles of massively parallel approach to high-performance computing using multiprocessing systems and/or combination of GPU hardware and specialised software environment. The seminar gives an overview of the types of high-performance computing hardware and software architecture, computing algorithms, application libraries and tools. More attention is paid to the applied interdisciplinary use of GPU-based parallel computing platform CUDA, e.g., analysis of large data amount, image processing, and machine learning tasks. Along with theoretical information, there is also a possibility to acquire basic skills in developing IT solutions using CUDA.

Day 1

CUDA architecture (30 min):

History of GPU development
Types of GPU architecture supported by CUDA

CUDA programming (30 min):

CUDA programming model
Basic principles of CUDA programming
Concepts of threads and blocks
GPU and CPU data exchange

Parallel algorithms in CUDA environment (30 min):

Parallel reduction
Sum of prefixes

Practical task: exercises in developing simple CUDA programs (2 h)

Day 2

CUDA memory hierarchy (30 min):

Memory levels
Register file, constant memory
Global memory
Shared memory
Texture memory
Unified memory

CUDA libraries (20 min):

CUBLAS
CURAND
CUFFT
CUSPARSE

CUDA interaction with computer graphics (20 min):

OpenGL interoperability
Image filter

CUDA application in machine learning (20 min):

Deep Neural Network Library cuDNN
Machine learning library TensorFlow

Practical task: exercises in using CUDA development tools (2 h):

data processing using CUBLAS and CURAND
image processing in GPU environment

II part “Applied use of the high-performance computing technology CUDA”

This course continues the previous “Introduction to high-performance computing technology CUDA” by focusing on CUDA implementation on multiprocessor graphical systems and CUDA cloud computing possibilities in a remote server environment, in particular. At the end, CUDA practical exercises will be offered on the RTU HPC cluster.

Day 1

Efficient use of CUDA memory (30 min):

Textures, arrays and possibilities of using them
Sharing of CUDA memory
CUDA unified memory

CUDA streams and events (30 min):

Streams and events for parallel execution
Asynchronous data copying
Parallel core execution and data exchange

Debugging and profiling of CUDA software (30 min):

Performance evaluation and metrics
Overview of Nvidia Nsight tool for debugging and profiling
Debugging of CUDA software
Profiling of CUDA performance

Practical task: exercises in developing applied CUDA programs (2 h):

Interaction among Nbodies
Interactive fluid simulation
Implementation of normalised correlation algorithm in image processing

Day 2

CUDA graphics in multiprocessor systems (30 min):

Data exchange between GPUs
Synchronisation of execution

CUDA in a remote server environment (1 h):

CUDA as a cloud computing service
Operation principles and architecture of HPC cluster
Parallel execution of CUDA jobs on the RTU HPC cluster

Practical task: exercises in developing applied CUDA programs (2 h):

CUDA exercises on the RTU HPC cluster
Photorealistic rendering in CUDA environment

APPLY FOR COURSES

I PART “INTRODUCTION TO HIGH-PERFORMANCE COMPUTING TECHNOLOGY CUDA”

II PART “APPLIED USE OF THE HIGH-PERFORMANCE COMPUTING TECHNOLOGY CUDA”

Massively Parallel GPU Computing with CUDA: Introduction

Accelerating Machine Learning with CUDA

I part “Introduction to high-performance computing technology CUDA”

Day 1

Day 2

II part “Applied use of the high-performance computing technology CUDA”

Day 1

Day 2

Contacts

Our social media

Links