Massively Parallel GPU Computing with CUDA: Introduction

Instructor:              Asoc. prof. Dr. sc. ing. Arnis Lektauers

Prerequisites:        Some programming experience in C/C++ and Python, as well as knowledge of parallel/threaded programming models would be useful

Duration:                8h within two days

Format:                   Online lecture and hands-on (practice)

Target audience:  Scientists and programmers who want to use CUDA for scientific application development.

Number of Participants: 5-15

The training course covers the theoretical and practical principles of massively parallel GPU computing with CUDA technology based on hands-on exercises, highlighting the capabilities of the parallel computing on the way with increasing complexity. The course will discuss the CUDA hardware and software architecture, memory management, parallel computing with C/C++ and Python, common application libraries and tools. An additional attention is paid to the possibilities and advantages of using CUDA in machine learning solutions.

A remote access to RTU training machines will be provided (RDP and SSH client software is needed on the participant computers). Alternatively, the attendees can use their own computers with CUDA (>=8.0) compatible GPUs for the course.

Learning Outcomes:

At the end of the course, attendees should be able to make an informed decision on how to approach GPU parallelisation in their applications in an efficient and portable manner.

Course content

Day I

  1. Overview of CUDA architecture and programming model:
    1. GPU evolution
    2. CUDA GPU architecture
  2. Basic CUDA programming:
    1. Brief revise of CUDA programming model
    2. Key principles
    3. Introduction to the concept of threads & blocks
    4. Host-device data transfer
  3. Hands-on exercises on writing simple CUDA programs:
    1. Using CUDA on HPC cluster
    2. Simple programs with C/C++
    3. Simple programs with Python and CuPy

Day II

  1. Overview of CUDA memory hierarchy:
    1. An overview of memory levels
    2. Global memory
    3. Registers, constant memory, texture memory
    4. Shared memory and synchronization
  2. Introduction to CUDA Deep Neural Network library (cuDNN):
    1. Using cuDNN for deep neural networks
    2. Convolutional neural networks in cuDNN
    3. Integration with other CUDA libraries (cuBLAS, cuSOLVER, cuRAND, cuTENSOR, TensorRT)
  3. Exercises on CUDA techniques: neural network implementation
    1. Implementation from scratch with C/C++
    2. Implementation from scratch with Python and CuPy
    3. Implementation using cuDNN

Here you can see the presentation from Day 1: CUDA_day_I

Here you can see presentation from Day 2: CUDA_day_II

The CoE RAISE project have received funding from the European Union’s Horizon 2020 – Research and Innovation Framework Programme H2020-INFRAEDI-2019-1 under grant agreement no. 951733