Async Coordination: Barriers
HW units signal mbarrier on completion — warp groups wait before proceeding
GMEM
A, B
SMEM
TMEM
accum
Reg
cast
SMEM
GMEM
D
Warp 0 — Producer
TMA dispatch
Warp 1 — Consumer
tcgen05 dispatch
Warp 0 — next phase
TMA dispatch
TMA
Engine
mbar
TMA ready
Tensor
Core
mbar
SMEM free
Coordination:
- mbarrier (mbar) — used to coordinate asynchronous operations across threads and hardware units
- Different warps run in parallel on different stages of operations