CUDA concepts I will forget
Simone's Lecture: https://indico.cern.ch/event/1544966/
Talk 1
A GPU has many "Streaming multi-processor", or "cores" (the columnsin schematic in slide 31): adjacent elements are accessed by adjacent threads. To use a GPU we always need to interface it with a CPU, that communicates with the memoery (disk). - Host: CPU - Device: GPU
Code example:
#include <iostream>
__global__ void mykernel() {}
int main() {
cudaStream_t stream; cudaStreamCreate(&stream);
mykernel<<<1,1,0,stream>>>();
std::cout << "Hello World|\n";
cudaStreamSynchronize(stream);
cudaStreamDeastroy(stream);
}
The function that is run on the GPU is called through the CUDA keyword __global__, which indicates a function that is called from host code and runs on the device. Triple angle brackets mark a call from host code to device code.
mykernel<<<1,1,0,stream>>>();
- 1 thread per block
- 1 block
- 0 shared memory
- stream
To compile:
nvcc script.cu
To execute:
./a.out
The nvcc compiler separates source code into host and device components
− Device (GPU) functions (e.g. mykernel()) processed by nvcc compiler
− Host (CPU) functions (e.g. main()) processed by gcc
-
Example: Add function
global void add(const int a, const int b, int c) { c = a + b; }
The pointers a, b, c must point to device memory.
To compile:
nvcc -std=c++20 cuda_mem_model.cu -o ex01