性能分析与监控工具与调试工具

作者: leenldk
时间: 2022-11-12
分类: 未分类

perf

perf record [command] 收集 command 数据，将结果写入 perf.data
perf report 读取 perf.data 中采样数据，给出热点分析结果

- 阅读剩余部分 -

CUDA 编译相关

作者: leenldk
时间: 2022-10-06
分类: 未分类

nvcc 编译过程

nvcc 编译分为两个阶段
首先将 .cu 编译为面向虚拟架构的 .ptx 代码 (stage1)
然后将 .ptx 编译为面向实际架构的二进制代码 (stage2)

- 阅读剩余部分 -

cuda cupti

作者: leenldk
时间: 2022-10-02
分类: 未分类

cupti : CUDA profiling tools interface
搭建面向 CUDA 应用的 profiling 和 tracing 工具

CUDA dynamic parallelism : CDP

CUPTI 提供四种 API :

activity api
callback api
event api
metric api

CUPTI 在第一次调用 CUPTI 函数时懒惰初始化
cuptiSubscribe() : 最先调用，防止多个 CUPTI client 互相干扰

目前理解：CUPTI 分为用户端和服务端，服务端记录 CUDA 设备和 CPU 上产生的事件 CUpti_Activity，储存在用户端提供的 Activity Buffer 上
CUPTI 不保证 activity 在 activity buffer 中的顺序
用户端调用 cuptiActivityFlushPeriod 和 cuptiActivityFlushAll

CUPTI 创建一个 worker thread，以减少对 application thread 的干扰

activate api

activate record : 记录事件，使用基类 CUpti_Activity
activity buffer : 将 activity record 从 CUPTI 转移到 client
使用 cuptiActivityEnable 或 cuptiActivityEnableContext 初始化
activity kind

论文阅读——GPU Virtualization and Scheduling Methods: A Comprehensive Survey

作者: leenldk
时间: 2022-09-30
分类: 未分类,论文阅读

一篇关于 GPU 虚拟化的 survey 文章，发表于 ACM Computing Surveys

Background 里面一段比较有意思的话：

On the contrary, the design of conventional processors is optimized
for reducing the execution time of sequential code on each core, thus adding complexity
to each core at the cost of offering fewer cores in the processor package. Conventional
processors typically use sophisticated control logic and large cache memories to efficiently
deal with conditional branches, pipeline stalls, and poor data locality.

传统处理器的目标是尽可能加速每个核心的串行执行时间，因此每个核心有大量处理分支预测，流水线延迟，data locality 相关的资源。代价是核心数量较少

linux 驱动相关

作者: leenldk
时间: 2022-09-22
分类: 未分类

linux 将外设抽象为 /dev 下的文件，通过统一的文件读写接口访问外设

linux与外设通信：

I/O 端口：通过 I/O 读写访问设备
I/O 内存映射：对 I/O 端口进行内存映射，将外设地址映射到内存地址，PCI 总线寻址通过内存映射完成
中断

- 阅读剩余部分 -

分类未分类下的文章

性能分析与监控工具与调试工具

perf

CUDA 编译相关

nvcc 编译过程

cuda cupti

activate api

论文阅读——GPU Virtualization and Scheduling Methods: A Comprehensive Survey

linux 驱动相关

ヒトコト

my friends

最新文章

最近回复

分类

其它

归档

分类 未分类 下的文章

性能分析与监控工具与调试工具

perf

CUDA 编译相关

nvcc 编译过程

cuda cupti

activate api

论文阅读——GPU Virtualization and Scheduling Methods: A Comprehensive Survey

linux 驱动相关

ヒトコト

my friends

最新文章

最近回复

分类

其它

归档

分类未分类下的文章