Tensor Core
Transpose A
No
Yes
Transpose B
No
Yes
N (output rows)
16
32
64
128
M (output cols)
16
32
64
128
K
16
Row group
Col group
A (128×128)
×
B (128×128)
→
C
K iteration: