一些硬件参数
A100
peak FP64 : 9.7TFLOPS
peak FP32 : 19.5 TFLOPS
peak FP16 : 78 TFLOPS
peak TF32 tensor core : 156 TFLOPS
192KB L1 cache (shared memory) / SM
40MB L2 cache
40GB 主存, 1555GB/s 带宽
PCIe 4 : 31.5GB/s
peak FP64 : 9.7TFLOPS
peak FP32 : 19.5 TFLOPS
peak FP16 : 78 TFLOPS
peak TF32 tensor core : 156 TFLOPS
192KB L1 cache (shared memory) / SM
40MB L2 cache
40GB 主存, 1555GB/s 带宽
PCIe 4 : 31.5GB/s