icon-cookie
The website uses cookies to optimize your user experience. Using this website grants us the permission to collect certain information essential to the provision of our services to you, but you may change the cookie settings within your browser any time you wish. Learn more
I agree
blank_error__heading
blank_error__body
Text direction?

CUDA编程入门极简教程 - 知乎

Measure
Measure
Related Notes
Get a free MyMarkup account to save this article and view it later on any device.
Create account

End User License Agreement

Summary | 19 Annotations
host指代CPU及其内存
2020/09/02 11:25
device指代GPU及其内存
2020/09/02 11:25
用<<<grid, block>>>来指定kernel要执行的线程数量
2020/09/02 11:25
核函数用__global__符号声明
2020/09/02 11:25
每一个线程
2020/09/02 11:25
threadIdx
2020/09/02 11:25
host不会等待kernel执行完就执行下一步
2020/09/02 11:26
但可和__device__,此时函数会在device和host都编译
2020/09/02 11:26
kernel在device上执行时实际上是启动很多线程
2020/09/02 11:28
一个线程块里面包含很
2020/09/02 11:28
多线程
2020/09/02 11:28
blockIdx
2020/09/02 11:29
threadIdx
2020/09/02 11:29
blockDim
2020/09/02 11:29
gridDim
2020/09/02 11:30
一个线程块只能在一个SM上被调度。SM一般可以调度多个线程块
2020/09/02 11:33
block大小一般要设置为32的倍数
2020/09/02 11:34
使用cudaMallocManaged函数分配托管内存
2020/09/02 11:36
由于托管内存自动进行数据传输,这里要用cudaDeviceSynchronize()函数保证device和host同步
2020/09/02 11:37