Tensor Core
Transpose A
No
Yes
Transpose B
No
Yes
M (output rows)
16
32
64
128
N (output cols)
16
32
64
128
K
16
Row group
Col group
A (128×128)
×
B (128×128)
→
C
K iteration: