CUDA concepts I will forget

Simone's Lecture: https://indico.cern.ch/event/1544966/

Talk 1

A GPU has many "Streaming multi-processor", or "cores" (the columnsin schematic in slide 31): adjacent elements are accessed by adjacent threads. To use a GPU we always need to interface it with a CPU, that communicates with the memoery (disk). - Host: CPU - Device: GPU

Code example:

#include <iostream>

__global__ void mykernel() {}

int main() {
    cudaStream_t stream; cudaStreamCreate(&stream);
    mykernel<<<1,1,0,stream>>>();
    std::cout << "Hello World|\n";
    cudaStreamSynchronize(stream);
    cudaStreamDeastroy(stream);
}

The function that is run on the GPU is called through the CUDA keyword __global__, which indicates a function that is called from host code and runs on the device. Triple angle brackets mark a call from host code to device code.

mykernel<<<1,1,0,stream>>>();

1 thread per block
1 block
0 shared memory
stream

To compile:

nvcc script.cu

To execute:

./a.out

The nvcc compiler separates source code into host and device components − Device (GPU) functions (e.g. mykernel()) processed by nvcc compiler − Host (CPU) functions (e.g. main()) processed by gcc

Example: Add function

global void add(const int a, const int b, int c) { c = a + b; }

The pointers a, b, c must point to device memory.

To compile:

nvcc -std=c++20 cuda_mem_model.cu -o ex01