windows deep internals: мая 2026

четверг, 28 мая 2026 г.

RE of PTX grammar from ptxas, part 2

instructions that cicc cannot generate

The idea occurred to me that we also could make minus of PTX instructions from cicc and so get instructions which cicc just unable to produce. So I add to iptx.pl new option -U and got file ptx_not_in_cicc.txt with 114 unique names

Btw PTX in total has only 268 unique names - so 114 is 42.5%. So what's remarkable instructions missed:

cctl for cache control
lop3 - yeah, I saw them many times in SASS, so it later generated by ptxas during optimization passes
r2p
and all video instructions

And this lead me to conclusion that official MLIR for cuda is totally incomplete

MLIR was initially a very dubious idea IMHO - what if we have some unscrupulous HW vendor who prefers to hide many details of it's hardware? And even worse - usually you use several MLIR dialects (like gpu, nvgpu, nvvm, linalg etc), so at least one of them must be aware of all of them. And this lead to exponential explosion of complexity - you can expect items from each of used dialects while doing optimization

some instructions are totally undocumented

Дальше »

воскресенье, 24 мая 2026 г.

RE of PTX grammar from ptxas

Disclaimer

Highly likely that author is an illiterate, inattentive, and incompetent lazy person with a poor imagination - therefore his hypotheses may be questionable, ideas delusional and his analysis simply incorrect. Also maybe I still haven't mastered ida pro in 28 years so extracted data can be incomplete/have missed parts. As always all code on perl and therefore offends the aesthetic feelings of believers

Prior works

Official PTX ISA. We all know than nvidia is evil and paranoid, so this document also incomplete and maliciously conceals information. Proofs are somewhere below in this text
ANTLR ptx grammar - very outdated, based on cuda-waste parser from 2010
infamous zluda. It's enough to look at their AST to understand that they support at best a third of the instructions
nvopen-tools by Grigory Evko. AI generated slop, but at least we can borrow from chapter 7 format of instructions and decoding scheme for arguments

So as you can see there is no machine readable grammar for modern PTX, Why this is important at all? Well, according to "Official guide to inline PTX"

The compiler front end does not parse the asm() statement template string and does not know what it means or even whether it is valid PTX input

Therefore you can successfully compile your buggy code to PTX and suddenly got mysterious errors during dynamic loading over JIT. Plus I always suspected that nvidia hides as much information from us as possible

So I started with some disassembly of ptxas version V10.1.243 from sdk 13.1 looking for PTX instruction names (encrypted btw)

Data extracting

Instruction attributes dynamically filled in two places

in huge function at 0xC2341C - extracted data
in array of functions located at 0x2971260 - data merged with previous chunk

Please don't ask me why there are 2 separate places. More importantly that code from both looks uniform