leenldk 发布的文章

论文阅读——GPU Virtualization and Scheduling Methods: A Comprehensive Survey

作者: leenldk
时间: 2022-09-30
分类: 未分类,论文阅读

一篇关于 GPU 虚拟化的 survey 文章，发表于 ACM Computing Surveys

Background 里面一段比较有意思的话：

On the contrary, the design of conventional processors is optimized
for reducing the execution time of sequential code on each core, thus adding complexity
to each core at the cost of offering fewer cores in the processor package. Conventional
processors typically use sophisticated control logic and large cache memories to efficiently
deal with conditional branches, pipeline stalls, and poor data locality.

传统处理器的目标是尽可能加速每个核心的串行执行时间，因此每个核心有大量处理分支预测，流水线延迟，data locality 相关的资源。代价是核心数量较少

动态链接库函数劫持

作者: leenldk
时间: 2022-09-28
分类: technique

之前做过劫持 cuda runtime 动态链接库的事情，在这里记录一下：

cudaError_t cudaMalloc ( void** devPtr, size_t size )
{
    if(!fp) {
        fp = fopen("/home/leenldk/sc/race/ASC2021-RACE/mem.out", "w");
    }
    cudaError_t (*lcudaMalloc) (void**, size_t) = (cudaError_t (*) (void**, size_t))dlsym(RTLD_NEXT, "cudaMalloc");
    cudaError_t ret = lcudaMalloc(devPtr, size);
    printf("cudaMalloc size : %#lx %#lx\n", size, (size_t)(*devPtr));
    fprintf(fp, "cudaMalloc size : %#lx %#lx\n", size, (size_t)(*devPtr));
    fflush(fp);
    return ret;
}

编译为动态链接库，使用 LD_PRELOAD 预加载

update 2022.11.2

C++ 动态库加载，卸载时调用函数：

static void init() __attribute__((constructor));
void init() {}
static void fini() __attribute__((destructor));
void fini() {}

linux 驱动相关

作者: leenldk
时间: 2022-09-22
分类: 未分类

linux 将外设抽象为 /dev 下的文件，通过统一的文件读写接口访问外设

linux与外设通信：

I/O 端口：通过 I/O 读写访问设备
I/O 内存映射：对 I/O 端口进行内存映射，将外设地址映射到内存地址，PCI 总线寻址通过内存映射完成
中断

- 阅读剩余部分 -

ROCm hip 相关

作者: leenldk
时间: 2022-09-21
分类: 未分类

memory

managed memory : 使用 linux heterogeneous memory management (HMM)， device 和 host 端可以以相同指针访问同一块内存

coherent memory : 可以在 kernel 运行时执行对 host 和其他 peer 可见的原子操作，通过不 cache 内存实现
non-coherent memory : device 端 cache 的内存，修改不实时可见

调度

direct dispatch : runtime 直接将操作发送至 AQL 队列
device side malloc

编译与链接

HIP 支持两种 static lib：

只包含 host 函数，可以使用 gcc 等非 hipcc 编译器链接
包含 device 函数，只能使用 hipcc 链接

CUDA api 相关

作者: leenldk
时间: 2022-09-17
分类: 未分类

driver api v.s. runtime api

driver api 更为细粒度，例如 runtime 中所有 kernel 初始化时自动 load 且程序运行时保持 load ，driver api 可以只保持当前需要 load 的 module

从用户接口上：
driver api 通常返回 CUresult
runtime api 通常返回 cudaError_t

- 阅读剩余部分 -

论文阅读——GPU Virtualization and Scheduling Methods: A Comprehensive Survey

动态链接库函数劫持

linux 驱动相关

ROCm hip 相关

memory

调度

编译与链接

CUDA api 相关

driver api v.s. runtime api

ヒトコト

my friends

最新文章

最近回复

分类

其它

归档