mbarrier.init(mbar, 1) — setup once across all runs, explicit sync after setup.
mbarrier.init(mbar, 1)
mbarrier.arrive(mbar) decrements pending_count. When it hits 0, phase completes.
mbarrier.arrive(mbar)
mbarrier.try_wait(mbar, phase) suspends T1 until phase completes.
mbarrier.try_wait(mbar, phase)