分类 technique 下的文章

rust入门

作者: leenldk
时间: 2021-02-24
分类: technique

“妈的，之后看到的语法要随手记下来”

调用库： use std::io;
生成文档：cargo doc --open
match 关键字：

    match guess.cmp(&secret_number) {
        Ordering::Less => println!("Too small!"),
        Ordering::Greater => println!("Too big!"),
        Ordering::Equal => println!("You win!"),
    }

ownership :
- 每个值有一个变量作为owner
- 同时只能有一个owner
- 当owner到scope外时，值会消失

rust 特性：

所有权机制保证的安全性
显示变量类型语言，但支持类型的推导
作用域特性：变量在定义的作用域外会被丢弃
不允许空值（null）
不允许悬挂指针
不允许 race condition（因此同一个变量只能有一个可变引用或多个不可变引用）

! 结尾表示宏展开

通用语法

for 循环：

for x in 0..10 {
    println!("{}", x); // x: i32
}

// enumerate 版本
for (i,j) in (5..10).enumerate() { 
    println!("i = {} and j = {}", i, j);
}

区间： a..b 表示 [a, b)，类型 core::ops::Range ，标准库为其实现了 Iterator 的 trait

match 语句：

match v.get(2) {
    Some(third) => println!("The third element is {}", third),
    None => println!("There is no third element."),
}

同时是 Option<&T> 的应用
Option 定义：

enum Option<T> {
    Some(T),
    None,
}

Result 定义：

enum Result<T, E> {
    Ok(T),
    Err(E),
}

结构体定义与结构体函数：

struct Point<T, U> {
    x: T,
    y: U,
}

impl<T, U> Point<T, U> {
    fn mixup<V, W>(self, other: Point<V, W>) -> Point<T, W> {
        Point {
            x: self.x,
            y: other.y,
        }
    }
}

tuple 结构体：

struct Color(i32, i32, i32);
struct Point(i32, i32, i32);

let black = Color(0, 0, 0);
let origin = Point(0, 0, 0);
println!("{} {} {}", black.0, black.1, black.2);

triait 定义与使用：

pub trait Summary {
    fn summarize(&self) -> String;
}
impl Summary for NewsArticle {
    fn summarize(&self) -> String {
        format!("{}, by {} ({})", self.headline, self.author, self.location)
    }
}

trait 参数传入：

pub fn notify(item1: impl Summary, item2: impl Summary) {}
pub fn notify(item: impl Summary + Display) {}
pub fn notify<T: Summary + Display>(item: T) {}

函数 && 泛型： fn largest<T>(list: &[T]) -> T { }

生命周期注解：

&i32        // 引用
&'a i32     // 带有显式生命周期的引用
&'a mut i32 // 带有显式生命周期的可变引用

手动 drop 引用：
core::mem::drop(inner);

数组：let mut app_start: [usize; MAX_APP_NUM + 1] = [0; MAX_APP_NUM + 1];

具体其他

新建 vector : let v : Vec<i32> = Vec::new();, let v = vec![1, 2, 3]

box : 在堆上分配内存空间。

// 把栈上变量转移到堆上
let val: u8 = 5;
let boxed: Box<u8> = Box::new(val);

// 把变量从堆上转移到栈上
let boxed: Box<u8> = Box::new(5);
let val: u8 = *boxed;

string :

&str : string slices, 定长，不能 mut，是指向 UTF-8 字节串的 reference
let greeting = "Hello there.";
"Hello there." 为 string literal，类型 &static str，编译时分配，lifetime为整个程序
let s = String::from("hello"); 类型 String
let slice = &s[3..]; 类型 &str，为切片, slice.len()
s.as_str() 类型 &str
let s1 = "Hello world!"; 类型 &str，为切片

数组切片 :

let a = [1,2,3,4,5];
let slice = &a[1..3]; 类型 &[i32] 为数组切片

升级概念（？）

普通指针：类型 & 和 &mut 两种

    let mut num = 5;
    let num_ref = &mut num;
    *num_ref = 100;

胖指针：如切片，保存了指针和长度信息，类型 & 和 &mut 两种

    let mut arr = [1, 2, 3, 4];
    let slice = &mut arr[1..4];
    slice[0] = 100;

裸指针：类似 C++ 指针，可能为 null
创建裸指针为safe操作，读写裸指针为unsafe 操作，类型 *mut 和 *const 两种

    let mut num = 1;
    // 将引用转为裸指针
    let num_raw_point = &mut num as *mut i32;
    unsafe {
        *num_raw_point = 100;
        println!("{} {} {:p}", num, *num_raw_point, &num); 
        // Output: 100 100 0x8d8c6ff6bc
    }

访问 *mut 裸指针元素： *a.offset(1) = 1

#[derive(Copy, Clone)] 在类前添加，让编译器自动添加 Copy/Clone trait，使传参时不会发生所有权转移

结构相关

目录下 mod.rs 和与目录同名的 .rs 文件起到导出内部接口作用

cmake && makefile 使用

作者: leenldk
时间: 2021-01-22
分类: technique

cmake

记录一些cmake中使用的命令
cmake 通常使用 out-of-source build，将 build 内容存放在 source tree 之外。
out-of-source build 时 source tree 中不能有 cmake 相关文件。如果 source tree 中有 CMakeCache.txt，cmake 会认为目录是一个 build tree。
cmake 中使用绝对路径，不能拷贝 build tree

set(CMAKE_CXX_FLAGS "-std=c++14 -O2 -g -Wall ${OpenMP_CXX_FLAGS}") # 设置C++编译选项
set(CUDA_NVCC_FLAGS "-Xcompiler -fopenmp -std=c++14 -O2 -g -arch=compute_70 -code=sm_70 --ptxas-options=-v -lineinfo -keep") # 设置cuda编译选项
option(SHOW_SCHEDULE "Print the schedule" ON) #设置一个ON/OFF的选项
add_definitions(-DBACKEND=0) #增加一个define
if (BACKEND STREQUAL "serial")
    add_definitions(-DBACKEND=0)
elseif(BACKEND STREQUAL "group")
    add_definitions(-DBACKEND=1)
else()
    MESSAGE(ERROR "invalid mode")
endif() # if使用，endif中留空即可
add_executable(${BENCHMARK} micro-benchmark/${BENCHMARK}.cpp) #

cmake -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON 输出makefile选项
set(CMAKE_CUDA_FLAGS "-Xcompiler -std=c++14 -O2 -g -arch=compute_70 -code=sm_70 -cudart=shared ") 设置cuda flag


cmake_minimum_required(VERSION 3.13) # cmake最低版本（必须指定）
project(Demo1) # 项目信息
add_executable(Demo a.cc b.cc) # 从 a.cc b.cc 编译可执行文件 Demo
aux_source_directory(. DIR_SRCS) # 查找目录 . 下所有文件，结果存到 ${DIR_SRCS}
add_subdirectory(math) # 添加子目录，处理器其中 CMakeLists.txt
option (USE_MYMATH "Use provided math implementation" ON) # 添加选项

set(CMAKE_EXPORT_COMPILE_COMMANDS ON) # 生成 compile_commands.json 包含所有编译指令 
-DCMAKE_EXPORT_COMPILE_COMMANDS=on

CMAKE_CURRENT_BINARY_DIR : 当前 subdirectory 在 build tree 中的目录
CMAKE_CURRENT_SOURCE_DIR : 当前源代码路径
CMAKE_BINARY_DIR : build tree 顶层路径
CMAKE_SOURCE_DIR : 源代码路径顶层
EXECUTABLE_OUTPUT_PATH :

add_library()
set()
get_filename_component()
set_source_files_properties(GENERATED)

makefile

@前缀：执行指令，不在屏幕显示

$@ : target being generated
$< : first prerequiste
$^ : all prerequiste

all: library.cpp main.cpp

$@ evaluates to all
$< evaluates to library.cpp
$^ evaluates to library.cpp main.cpp

example :

# Define required macros here
SHELL = /bin/bash

OBJS =  main.o factorial.o hello.o
CFLAG = -Wall -g
CC = gcc

hello:${OBJ}
   ${CC} ${CFLAGS} -o $@ ${OBJS} 

clean:
   -rm -f *.o core *.core

.cpp.o:
   ${CC} ${CFLAGS} -c $<

QCSimulator相关

作者: leenldk
时间: 2021-01-19
分类: technique

compiler

函数 getGroup：规划出一个 group，返回一个 GateGroup

schedule

GateGroup：一组门
1. relatedQubits
2. state
3. cuttPlans

compile

backend 选项：group, mix, blas

cuda libraries 使用

作者: leenldk
时间: 2021-01-16
分类: technique

cublas

首先创建 cublas handle

#include <cublas_v2.h>
#define checkCudaErrors(status) do {                                   \
    std::stringstream _error;                                          \
    if (status != 0) {                                                 \
      _error << "Cuda failure: " << status;                            \
      FatalError(_error.str());                                        \
    }                                                                  \
} while(0)


cublasHandle_t cublasH;
checkCudaErrors(cublasCreate(&cublasH));
// 之后的 library function call 显式传入 handle
cublasDestroy(cublasH);

curand

#include <curand.h>
curandGenerator_t curand;
curandCreateGenerator(&curand, CURAND_RNG_PSEUDO_DEFAULT);
curandSetPseudoRandomGeneratorSeed(curand, 123ULL);
curandGenerateUniform(curand, p, size);

NCCL

nvidia 集合通信库。多 GPU 多节点通信原语。
支持 all-reduce, all-gather 等

profiler工具

作者: leenldk
时间: 2021-01-15
分类: technique

vtune

intel profiler

source /home/leenldk/intel/oneapi/vtune/2021.2.0/env/vars.sh  #加载

gprof

gcc 开源 profile 工具

编译时添加 -pg 选项进行插装
运行后生成 gmon.out
通过 gprof 输出 profiling 文件

gcc example.c -o temp -g -pg
./temp
gprof temp > profiling.out

nvprof

update : nvprof 已经不再支持最新 GPU，请使用 nsys 和 ncu

cuda toolkit 中自带工具
使用：

nvprof ./gemm # 输出 prof 结果
# 在使用了 unified memory 时可能需要 添加 --unified-memory-profiling off
nvprof --unified-memory-profiling off ./gemm

-o prof.nvvp : 输出为 nvvp 文件
--metrics [all/gld_throughput] : profile 所有参数/Global Load Throughput (可能需要 sudo)

可视化：使用 x11 forwarding nvvp prof.out
cuda 11 版本可能有 java 问题，此时需要
sudo apt install openjdk-8-jdk
nvvp -vm /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java prof.out

windows :
.\nvvp.exe -vm 'D:\Program Files\Java\jdk1.8.0_311\jre\bin\java.exe'

nsys (nsight system)

粗粒度 timeline profile

ncu (nsight compute)

细粒度单个 kernel 级别 profile
ncu --list-sets 获取支持的 metric section set

--set full
-o file