четверг, 28 мая 2026 г.

RE of PTX grammar from ptxas, part 2

instructions that cicc cannot generate

The idea occurred to me that we also could make minus of PTX instructions from cicc and so get instructions which cicc just unable to produce. So I add to iptx.pl new option -U and got file ptx_not_in_cicc.txt with 114 unique names
Btw PTX in total has only 268 unique names - so 114 is 42.5%. So what's remarkable instructions missed:
  • cctl for cache control
  • lop3 - yeah, I saw them many times in SASS, so it later generated by ptxas during optimization passes
  • r2p
  • and all video instructions
 
And this lead me to conclusion that official MLIR for cuda is totally incomplete

MLIR was initially a very dubious idea IMHO - what if we have some unscrupulous HW vendor who prefers to hide many details of it's hardware? And even worse - usually you use several MLIR dialects (like gpu, nvgpu, nvvm, linalg etc), so at least one of them must be aware of all of them. And this lead to exponential explosion of complexity - you can expect items from each of used dialects while doing optimization


some instructions are totally undocumented

воскресенье, 24 мая 2026 г.

RE of PTX grammar from ptxas

Disclaimer

Highly likely that author is an illiterate, inattentive, and incompetent lazy person with a poor imagination - therefore his hypotheses may be questionable, ideas delusional and his analysis simply incorrect. Also maybe I still haven't mastered ida pro in 28 years so extracted data can be incomplete/have missed parts. As always all code on perl and therefore offends the aesthetic feelings of believers

 

Prior works

  • Official PTX ISA. We all know than nvidia is evil and paranoid, so this document also incomplete and maliciously conceals information. Proofs are somewhere below in this text
  • ANTLR ptx grammar - very outdated, based on cuda-waste parser from 2010
  • infamous zluda. It's enough to look at their AST to understand that they support at best a third of the instructions
  • nvopen-tools by Grigory Evko. AI generated slop, but at least we can borrow from chapter 7 format of instructions and decoding scheme for arguments

So as you can see there is no machine readable grammar for modern PTX, Why this is important at all? Well, according to "Official guide to inline PTX"

The compiler front end does not parse the asm() statement template string and does not know what it means or even whether it is valid PTX input

Therefore you can successfully compile your buggy code to PTX and suddenly got mysterious errors during dynamic loading over JIT. Plus I always suspected that nvidia hides as much information from us as possible
 
So I started with some disassembly of ptxas version V10.1.243 from sdk 13.1 looking for PTX instruction names (encrypted btw)

 

Data extracting

Instruction attributes dynamically filled in two places
  • in huge function at 0xC2341C - extracted data
  • in array of functions located at 0x2971260 - data merged with previous chunk
Please don't ask me why there are 2 separate places. More importantly that code from both looks uniform