Swizzle Layout — SWIZZLE_128B
physical_sector = logical_sector XOR row | 8 rows × 32 banks, 4 banks per sector
Read
1 element = 1 bank (4 bytes). A warp of 32 threads reads 32 elements; bank conflicts serialize into extra cycles.
Column Tile (8×4): 8 rows × 4 cols (1 sector) = 32 reads.
With swizzle: 1 cycle. Without: 8 cycles (8-way conflicts).
Row Tile (1×32): 1 row × 32 cols = 32 reads.
Always 1 cycle (all 32 banks distinct within a row).