MBarrier to Signal TMA Completion

T0 (producer)
TMA Engine
MBar
T1 (consumer)
Timeline
Working / active
Signaling operation
Dispatch to TMA
Blocked / waiting
Barrier state
Idle
Signaling to change mbar state
Sync threads after init

Producer (T0)

arrive.expect_tx(4096) decrements pending to 0 and sets tx_count. cp.async.bulk triggers TMA transfer.

TMA Engine

Executes cp.async.bulk dispatched from T0. Auto-decrements tx_count on completion.

Consumer (T1)

try_wait suspends until pending==0 AND tx==0. HW auto-decrements tx_count when transfer completes.