tcgen05.mma

Transpose A
Transpose B
M (output rows)
N (output cols)
K
Row group
Col group

A (SMEM)

×

B (SMEM)

Tensor Core
tcgen05.mma

C (TMEM)

128 Lanes (M)
Col (N) — allocated by tmem.alloc
K iteration:
SMEM Descriptor (A, B)
smem_desc = base_addr | leading_byte_offset | stride_byte_offset | start_addr_off
tcgen05.mma PTX
tcgen05.mma.cta_group::1.kind::f16 taddr, a_desc, b_desc, idesc, enable;