windows deep internals: RE of PTX grammar from ptxas

воскресенье, 24 мая 2026 г.

RE of PTX grammar from ptxas

Disclaimer

Highly likely that author is an illiterate, inattentive, and incompetent lazy person with a poor imagination - therefore his hypotheses may be questionable, ideas delusional and his analysis simply incorrect. Also maybe I still haven't mastered ida pro in 28 years so extracted data can be incomplete/have missed parts. As always all code on perl and therefore offends the aesthetic feelings of believers

Prior works

Official PTX ISA. We all know than nvidia is evil and paranoid, so this document also incomplete and maliciously conceals information. Proofs are somewhere below in this text
ANTLR ptx grammar - very outdated, based on cuda-waste parser from 2010
infamous zluda. It's enough to look at their AST to understand that they support at best a third of the instructions
nvopen-tools by Grigory Evko. AI generated slop, but at least we can borrow from chapter 7 format of instructions and decoding scheme for arguments

So as you can see there is no machine readable grammar for modern PTX, Why this is important at all? Well, according to "Official guide to inline PTX"

The compiler front end does not parse the asm() statement template string and does not know what it means or even whether it is valid PTX input

Therefore you can successfully compile your buggy code to PTX and suddenly got mysterious errors during dynamic loading over JIT. Plus I always suspected that nvidia hides as much information from us as possible

So I started with some disassembly of ptxas version V10.1.243 from sdk 13.1 looking for PTX instruction names (encrypted btw)

Data extracting

Instruction attributes dynamically filled in two places

in huge function at 0xC2341C - extracted data
in array of functions located at 0x2971260 - data merged with previous chunk

Please don't ask me why there are 2 separate places. More importantly that code from both looks uniform

  pxor    xmm0, xmm0
  sub     rsp, 48h
  lea     rcx, a0000+2         ; "00" - ins operands
  lea     rdx, aEx2            ; "ex2" - ins name
  lea     rsi, aH32h32+3       ; "H32" - operands types
  mov     r8d, 4               ; ins index
  mov     [rsp+48h+var_18], 0
  mov     [rsp+48h+var_38], 0
  movaps  [rsp+48h+var_28], xmm0 ; zero 16 bytes mask
  mov     byte ptr [rsp+48h+var_28], 0A0h ; fill mask with some values
  mov     byte ptr [rsp+48h+var_28+3], 8
  ...

  mov     byte ptr [rsp+48h+var_28+1], 2
  movdqa  xmm0, [rsp+48h+var_28] ; load filled mask

...

  call ptx_ins_register_func

So I wrote some simple emulator (see function hack_ptx_ops) and extracted both tables

Format of each row:

instruction index
16 bytes mask
instruction name then tab
instruction operands then tab
and finally types of operands in single string

I got 270 unique instructions names and 1420 rows

Verification

Next logical question arises: how can we ensure that we have extracted all the data? Well, earlier I extracted huge list of PTX instruction from their cicc. So I made simple perl script to check intersection

Yes, all PTX from cicc presented in dumped data. Sure this doesn't mean that I dumped everything, but it gives me some confidence.

Analysis

First questionable hypotheses - this 16 bit masks are bitfields for instruction attributes, like in add.sat.u32 instruction name must be add and sat/u32 is some 1 in mask

Lets just look at dumped data to check how dense are bit masks and if they are the same for each instruction:

48 20 12 00 08 00 00 00 00 00 00 00 00 00 00 00 00 add 000 F16

48 20 10 00 02 00 00 00 00 00 00 00 00 00 00 00 00 add 000 I

wait - WHAT? masks for the same instruction differs depending on type of arguments. How is this possible?

Well, one possible answer - parser delayed parsing of attributes till got types of operands. I frankly don't remember where I saw this idea - perhaps in a book "Parsing Techniques: A Practical Guide" read in the last century

So I counted amount of 1 in all masks - 113

Attributes tables

and continued looking in disasm. Actually this was most tedious part of work - find and extract over a hundred of tables with encrypted strings. Result

s -1 tabs/ | wc -l
113

Hallelujah - the balance has been balanced

Analysis 2

ok, I have masks per instruction and tables - but how to link them? Lets choice some attribute and place it at index 0, then for index 1 we still have 112 variants and so on. In essence there is 113! of possible variants

Here I made couple of another hypotheses

masks must preserve order of attributes
bcs masks used as 128bit word in XMM register - 1 at index with lowest index must be located to the left. For example attribute corresponding to index 1 must precede attribute at index 2

So I add to my script several simple options for intersection of sets:

-f - do frequency analysis for each non-zero bit in mask
-a - intersection of masks for several instructions
-i - intersection of masks for several instructions minus masks of remaining
-o - union of masks for several instructions minus masks of remaining

All found masks stored in map gk_tabs and can be ignored with option -k

And couple of words why nvidia lies as usually

there are totally undocumented instructions like genmetadata/spmetadata
there are totally undocumented attributes - for example ignoreC_pred/ignoreC/frel
some instructions have attributes not presented in official documentation

for example exit

202 18 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 exit

Current status

For couple of days I was able to identify 26 attributes tabs - only 23%

So if you are passionate on PTX, love digging into unstructured data and performing operations on sets, your help is welcome

windows deep internals

воскресенье, 24 мая 2026 г.

RE of PTX grammar from ptxas

Disclaimer

Prior works

Data extracting

Verification

Analysis

Attributes tables

Analysis 2

Current status

Комментариев нет:

Отправить комментарий

воскресенье, 24 мая 2026 г.

RE of PTX grammar from ptxas

Disclaimer

Prior works

Data extracting

Verification

Analysis

Attributes tables

Analysis 2

Current status

Комментариев нет:

Отправить комментарий

воскресенье, 24 мая 2026 г.