Human
Human reasoning is typically close to this style: use high-level operations over tensor slices (tiles), e.g.:
- copy
- GEMM
- add
- fill
Such subroutines can be reused.
Directly operating on lowest-level can be painful, as you likely noticed in the previous example.
How Users Think
Human
Human reasoning is typically close to this style: use high-level operations over tensor slices (tiles), e.g.:
Such subroutines can be reused.
Agent
When agents handle instruction-level details directly, many errors are easy to introduce. Thinking in higher-level kernel algorithms is less error-prone, keeps programs concise, makes changes easier (context is always finite), and makes it easier to compose optimizations.
TIRx Mechanism
TIRx provides a first-class mechanism for this: deterministic operator dispatch.
Layout (swizzle + tile) attached to each tensor determines logical-to-physical mapping, provide efficient access to certain tile-slice patterns.
Execution scopes are explicit (kernel/CTA/warpgroup/warp/thread), defining exactly which thread group executes each operator.
Tile-level operators (Tx.copy, Tx.gemm_async, Tx.cast) on tensor tile slices, matching high-level reasoning. They are deterministically dispatched to low-level PTX based on layout, slice and execution scope.