Step 9: 2-CTA Cluster — Code
Same three-role structure as Step 7 — only highlighted lines change
TMA Producer (WG1/warp3)
MMA Consumer (WG1/warp0)
Writeback (WG0)
Key changes: CTA_GROUP=2 · remote_view(0) for cross-CTA barrier · cbx==0 guards · cta_mask=1→3 · cluster_sync