cuda 相关

cuda 相关

作者: leenldk

时间: 2025-02-11

分类: 未分类

每一个 host thread 绑定一个 device

__host__cudaError_t cudaSetDevice ( int device ) 绑定线程的 device

memory 相关

managed memory

在 host 和 device 上使用相同指针（virtual memory）访问。基于 page fault 机制

2025-02-10T08:23:13.png

使用 cudaMallocManaged 分配

当 GPU 访问 managed memory 出现 page fault 时，发生以下事件：

Allocate new pages on the GPU;
Unmap old pages on the CPU;
Copy data from the CPU to the GPU;
Map new pages on the GPU;
Free old CPU pages.

使用 __host__cudaError_t cudaMemPrefetchAsync ( const void* devPtr, size_t count, int dstDevice, cudaStream_t stream = 0 ) 将 managed memory prefetch 到 CPU 或 GPU

pinned host memory

分配的 host memory 默认是 pageable 的

GPU 无法直接访问 pageable 的 host memory，因此在拷贝 pageable host memory 时会先拷贝到 pinned memory，再拷贝到 device

可以使用 cudaMallocHost 直接分配 pinned host memory

2025-02-10T08:40:15.png

命令行选项

ncu --clock-control reset # 重置 profile 时导致的锁频

nvcc 编译选项 -lineinfo，配合 ncu --import-source yes ，在编译时保留行信息，并将源文件整合到 ncu profile 结果中

标签: none

memory 相关

managed memory

pinned host memory

命令行选项

添加新评论

ヒトコト

my friends

最新文章

最近回复

分类

其它

归档