windows deep internals: мая 2026

четверг, 28 мая 2026 г.

RE of PTX grammar from ptxas, part 2

PTX instructions that cicc cannot generate

While reverse-engineering Nvidia's compilation pipeline, I extracted the set of PTX instructions that cicc (the CUDA C++ frontend) is capable of emitting. The next logical step is to intersect them with full set of instructions accepted by ptxas - so we could get instructions which cicc just unable to produce. To do this I add to iptx.pl new option -U and got file ptx_not_in_cicc.txt with 114 unique names

PTX in total has only 268 unique names - so 114 is 42.5%. Notable missing instructions include:

cctl for cache control
lop3 - yeah, I saw them many times in SASS, so it generated by ptxas during optimization passes
r2p
11 variants of tcgen05.*
mad24/mul24
all video instructions like vadd/vmad/vset etc

This gap is large enough to be surprising and leads me to conclusion that official LLVM MLIR dialects for cuda are totally incomplete

MLIR was initially a very dubious idea IMHO - what if we have some unscrupulous HW vendor who prefers to hide many details of it's hardware? And even worse - when multiple MLIR dialects are involved (like gpu, nvgpu, nvvm, linalg etc), at least one of them has to maintain accurate mappings between all of them. This leads to exponential explosion of complexity - you can expect items from each of used dialects while doing optimization, and also creates surface area for bugs.

some instructions are totally undocumented

Дальше »

воскресенье, 24 мая 2026 г.

RE of PTX grammar from ptxas

Disclaimer

Highly likely that author is an illiterate, inattentive, and incompetent lazy person with a poor imagination - therefore his hypotheses may be questionable, ideas delusional and his analysis simply incorrect. Also maybe I still haven't mastered ida pro in 28 years so extracted data can be incomplete/have missed parts. As always all code on perl and therefore offends the aesthetic feelings of believers

Prior works

Official PTX ISA. We all know than nvidia is evil and paranoid, so this document also incomplete and maliciously conceals information. Proofs are somewhere below in this text
ANTLR ptx grammar - very outdated, based on cuda-waste parser from 2010
infamous zluda. It's enough to look at their AST to understand that they support at best a third of the instructions
nvopen-tools by Grigory Evko. AI generated slop, but at least we can borrow from chapter 7 format of instructions and decoding scheme for arguments

So as you can see there is no machine readable grammar for modern PTX, Why this is important at all? Well, according to "Official guide to inline PTX"

The compiler front end does not parse the asm() statement template string and does not know what it means or even whether it is valid PTX input

Therefore you can successfully compile your buggy code to PTX and suddenly got mysterious errors during dynamic loading over JIT. Plus I always suspected that nvidia hides as much information from us as possible

So I started with some disassembly of ptxas version V10.1.243 from sdk 13.1 looking for PTX instruction names (encrypted btw)

Data extracting

Instruction attributes dynamically filled in two places

in huge function at 0xC2341C - extracted data
in array of functions located at 0x2971260 - data merged with previous chunk

Please don't ask me why there are 2 separate places. More importantly that code from both looks uniform