tcgen05.mma
Transpose A
No
Yes
Transpose B
No
Yes
M (output rows)
128
N (output cols)
16
32
64
128
K
16
Row group
Col group
A (SMEM)
×
B (SMEM)
Tensor Core
tcgen05.mma
→
C (TMEM)
128 Lanes (M)
Col (N) — allocated by
tmem.alloc
K iteration:
SMEM Descriptor (A, B)
smem_desc =
base_addr
|
leading_byte_offset
|
stride_byte_offset
|
start_addr_off
tcgen05.mma PTX
tcgen05.mma.cta_group::1.kind::f16
taddr
,
a_desc
,
b_desc
,
idesc
,
enable
;