In a pipelined kernel, barriers are reused every iteration of the K-loop. Phase tracking solves the problem of
"did this barrier fire for iteration i or iteration i-1?"
phase_tma = 0
for k in range(K):
try_wait(tma_bar, phase_tma)
phase_tma ^= 1 # flip