分类 未分类 下的文章

6.30

开始看适之学长给的论文
introduction 中提到bert模型的pre_train过程代价极高
1024 V100 1day
bert large 很难在12GB ~ 16GB 的显卡上reproduce 结果

bert有multiple layers 的双向 transformers
每个 transformer 有一个 multi-head self-attention层,position-wise feed-forward层

counter <= (0      => '1',
            4      => '1',
            others => '0') // signal赋值
rising_edge(clk) //检测上升沿函数

#pragma omp simd
for循环前对for循环显式simd优化

#pragma omp declare simd
函数前使函数生成simd版本

#pargma ivdep
for循环前忽略vector dependence

#pargma vector nontemporal
跳过过渡cache,直接stream到最下层cache

#include <omp.h>
int nt = omp_get_max_threads();

omp最多线程数

#pragma omp parallel private(A) share(B)
{
    int C;
    omp_get_thread_num();
}

omp多线程运行
每个thread有独立的A变量,B变量在所有thread间share
每个thread有独立C

export OMP_NUM_THREADS=5
限制omp线程数

fork thread:

#include <pthread.h>
int pthread_create(pthread_t *tidp,const pthread_attr_t *attr,
(void*)(*start_rtn)(void*),void *arg);

fork出一个thread
-lpthread

fork process:

pid = fork();

parent进程pid = 0,child进程pid!=0

官方mooc笔记

factual question: evaluate each option to determine if it is correct
For the Negative Factual Information questions,
remember that you're looking for an answer
that either isn't in the paragraph,
or directly contradicts information in the paragraph.

给2020的自己订下两个目标:
1. 参加超算比赛,最好能和学长们一起拿到冠军
2. 发一篇paper,尽管目标很远,不过尽量去努力吧

以上