tcgen05.mma issues async matrix multiply. tcgen05.commit(mbar) tells barrier to track it; HW will arrive::one when done.
Executes tcgen05.mma and tcgen05.commit queued from T0.
mbarrier.try_wait(mbar, phase) suspends T1 until phase completes. Result available in TMEM.