GEMM Optimization
From shared memory tiling to TMA data movement
Click a step to see details
Design shared memory layout first, then configure TMA to fill it efficiently.
Tiling shape → Best swizzle → Swizzle-constrained tiling → TMA with swizzle