Layout: S[(2,4,8) : (1@gpuid_y, 8@m, 1@m)] + R[2 : 1@gpuid_x]
R = replicated dimension
Layout: S[(
2, 4, 8
) : (
1@gpuid_y, 8@m, 1@m
)]
+
R
[
2
:
1@gpuid_x
]
8×8 Matrix
GPU Mesh (2×2)
Address computation
addr = (i//4) ×
1@gpuid_y
+ (i%4) ×
8@m
+ j ×
1@m
R[2 : 1@gpuid_x]
→ same data on both GPU_x=0 and GPU_x=1