Tensor Core

Transpose A
Transpose B
M (output rows)
N (output cols)
K
Row group
Col group

A (128×128)

×

B (128×128)

C

K iteration: