Tensor Core

Transpose A
Transpose B
N (output rows)
M (output cols)
K
Row group
Col group

A (128×128)

×

B (128×128)

C

K iteration: