воскресенье, 24 мая 2026 г.

RE of PTX grammar from ptxas

Disclaimer

Highly likely that author is an illiterate, inattentive, and incompetent lazy person with a poor imagination - therefore his hypotheses may be questionable, ideas delusional and his analysis simply incorrect. Also maybe I still haven't mastered ida pro in 28 years so extracted data can be incomplete/have missed parts. As always all code on perl and therefore offends the aesthetic feelings of believers

 

Prior works

  • Official PTX ISA. We all know than nvidia is evil and paranoid, so this document also incomplete and maliciously conceals information. Proofs are somewhere below in this text
  • ANTLR ptx grammar - very outdated, based on cuda-waste parser from 2010
  • infamous zluda. It's enough to look at their AST to understand that they support at best a third of the instructions
  • nvopen-tools by Grigory Evko. AI generated slop, but at least we can borrow from chapter 7 format of instructions and decoding scheme for arguments

So as you can see there is no machine readable grammar for modern PTX, Why this is important at all? Well, according to "Official guide to inline PTX"

The compiler front end does not parse the asm() statement template string and does not know what it means or even whether it is valid PTX input

Therefore you can successfully compile your buggy code to PTX and suddenly got mysterious errors during dynamic loading over JIT. Plus I always suspected that nvidia hides as much information from us as possible
 
So I started with some disassembly of ptxas version V10.1.243 from sdk 13.1 looking for PTX instruction names (encrypted btw)

 

Data extracting

Instruction attributes dynamically filled in two places
  • in huge function at 0xC2341C - extracted data
  • in array of functions located at 0x2971260 - data merged with previous chunk
Please don't ask me why there are 2 separate places. More importantly that code from both looks uniform
  pxor    xmm0, xmm0
  sub     rsp, 48h
  lea     rcx, a0000+2         ; "00" - ins operands
  lea     rdx, aEx2            ; "ex2" - ins name
  lea     rsi, aH32h32+3       ; "H32" - operands types
  mov     r8d, 4               ; ins index
  mov     [rsp+48h+var_18], 0
  mov     [rsp+48h+var_38], 0
  movaps  [rsp+48h+var_28], xmm0 ; zero 16 bytes mask
  mov     byte ptr [rsp+48h+var_28], 0A0h ; fill mask with some values
  mov     byte ptr [rsp+48h+var_28+3], 8
  ...
  mov     byte ptr [rsp+48h+var_28+1], 2
  movdqa  xmm0, [rsp+48h+var_28] ; load filled mask
  ...
  call ptx_ins_register_func
 
So I wrote some simple emulator (see function hack_ptx_ops) and extracted both tables
Format of each row:
  1. instruction index
  2. 16 bytes mask
  3. instruction name then tab
  4. instruction operands then tab
  5. and finally types of operands in single string
I got 270 unique instructions names and 1420 rows

 

Verification

Next logical question arises: how can we ensure that we have extracted all the data? Well, earlier I extracted huge list of PTX instruction from their cicc. So I made simple perl script to check intersection
Yes, all PTX from cicc presented in dumped data. Sure this doesn't mean that I dumped everything, but it gives me some confidence.

 

Analysis

First questionable hypotheses - this 16 bit masks are bitfields for instruction attributes, like in add.sat.u32 instruction name must be add and sat/u32 is some 1 in mask
 
Lets just look at dumped data to check how dense are bit masks and if they are the same for each instruction:
48 20 12 00 08 00 00 00 00 00 00 00 00 00 00 00 00 add  000     F16
48 20 10 00 02 00 00 00 00 00 00 00 00 00 00 00 00 add  000     I 
 
wait - WHAT? masks for the same instruction differs depending on type of arguments. How is this possible?
 
Well, one possible answer - parser delayed parsing of attributes till got types of operands. I frankly don't remember where I saw this idea - perhaps in a book "Parsing Techniques: A Practical Guide" read in the last century

So I counted amount of 1 in all masks - 113


Attributes tables

and continued looking in disasm. Actually this was most tedious part of work - find and extract over a hundred of tables with encrypted strings. Result
s -1 tabs/ | wc -l
113

Hallelujah - the balance has been balanced


Analysis 2

ok, I have masks per instruction and tables - but how to link them? Lets choice some attribute and place it at index 0, then for index 1 we still have 112 variants and so on. In essence there is 113! of possible variants
 
Here I made couple of another hypotheses
  1. masks must preserve order of attributes
  2. bcs masks used as 128bit word in XMM register - 1 at index with lowest index must be located to the left. For example attribute corresponding to index 1 must precede attribute at index 2
So I add to my script several simple options for intersection of sets:
  • -f - do frequency analysis for each non-zero bit in mask
  • -a - intersection of masks for several instructions
  • -i -  intersection of masks for several instructions minus masks of remaining
  • -o - union of masks for several instructions minus masks of remaining
All found masks stored in map gk_tabs and can be ignored with option -k
 
And couple of words why nvidia lies as usually
  1. there are totally undocumented instructions like genmetadata/spmetadata 
  2. there are totally undocumented attributes - for example ignoreC_pred/ignoreC/frel
  3. some instructions have attributes not presented in official documentation
for example exit
202 18 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 exit

 

Current status 

For couple of days I was able to identify 26 attributes tabs - only 23%
So if you are passionate on PTX, love digging into unstructured data and performing operations on sets, your help is welcome

Комментариев нет:

Отправить комментарий