MBarrier to Signal tcgen05 Completion

T0 (producer)
Tensor Core
MBar
T1 (consumer)
Timeline
Working / active
Signaling operation
Dispatch to TensorCore
Blocked / waiting
Barrier state
Idle
Signaling to change mbar state
Sync threads after init

Producer (T0)

tcgen05.mma issues async matrix multiply. tcgen05.commit(mbar) tells barrier to track it; HW will arrive::one when done.

Tensor Core

Executes tcgen05.mma and tcgen05.commit queued from T0.

Consumer (T1)

mbarrier.try_wait(mbar, phase) suspends T1 until phase completes. Result available in TMEM.