Blackwell SM Architecture
Streaming Multiprocessor (SM)
Tensor Core (tcgen05)
5th-gen MMA
CUDA Core
FP/INT units
Shared Memory (SMEM)
228 KB per SM
Tensor Memory (TMEM)
TMEM — 128 lanes
Register File
TMA Engine
Data Mover
SM ...
× N
Global Memory (GMEM)
Click a hardware unit to see details. Solid arrows = load path, dashed arrows = store path.