GPU koda ģenerēšanas parametri

To achieve the best possible performance whilst being portable, GPU code should be generated for the architecture(s) it will be executed upon.

That is controlled by specifying -gencode arguments to NVCC which, unlike the -arch and -code arguments, allows for ‘fatbinary’ executables that are optimised for multiple device architectures.

Each -gencode argument requires two values, the virtual architecture and real architecture, for use in NVCC’s two-stage compilation. I.e. -gencode=arch=compute_60, code=sm_60 specifies a virtual architecture of compute_60 and real architecture sm_60.

The minimum specified virtual architecture must be less than or equal to the GPU’s Compute Capability used to execute the code.

To build a CUDA application which targets any GPU on HPC cluster “Rudens”, use the following -gencode arguments (for CUDA 8.0):

nvcc /