вторник, 23 июня 2026 г.

RE of PTX grammar from ptxas, part 4

Parts 1, 2 & 3

First of all, it should be noted that the mask of instruction attributes has size 20 bytes, so I updated dump for them.
structure for this attributes descriptor has size 0xd8 bytes and some fields:
  • mask at offset 0
  • name of instruction at 0xC8
  • index at 0xD0
Instructions selecting first by name and then right form by operand types. This means that while the order of the attributes does not matter, the relative order of the operand types is important - leftmost is type of operand 0, next is type of operand 1 and so on

 

Names of numerical pseudo-instructions

in part 3 I pointed out that there are 473 names consisting only of numbers, like "1030557441". Grigory Evko suggested that this is adler32 hash from builtin function names, so I found huge function for instruction 0xc6 (_gen_proto) returning 1078 prototypes like
.weak .func (.reg .f32 %fv1) __cuda_sm20_div_rz_f32 (.reg .f32 %fa1, .reg .f32 %fa2)
and then intersected them by hash - so now we know all real names


EBNF grammar

You can see it here
To build run iptx.pl -e
The last two columns are operand suffix & encoding 

 

How complete it is?

That's good question. If we accept that attributes descriptors contain full list of attributes for each instruction then 20 bytes masks has 121 non-zero bits:
FD FF FF FF F1 FF FF 9F F9 FF E7 CF FF F3 DF FF FF 00 00 00
I was able to identify 114 of them - this is 94%
Also currently I extracted 119 tables with attributes names and only 11 are still not connected (check them with iptx.pl -t)

On other hand in function for attributes processing there are 3 switch tables with 139, 140 & 173 cases (last one has ~90% of entries with error "Unexpected instruction types specified")

среда, 10 июня 2026 г.

recovering tokens from (f)lex generated code

While doing some reverse engineering of ptxas I discovered that their lexer was generated by lex in fast mode (lex -f). Knowing that nvidia trying to hide from us as much as possible it would be good to extract what tokens their lexer able to consume. Surprisingly I was unable to find in google solution for this simple task of tokens recovery. And even worse - seems that nobody understand how 40 year code in lex DFA works. So as usually I had do it by myself

 

Code

Lets check how generated code looks like:

struct yy_trans_info
        {
        flex_int32_t yy_verify;
        flex_int32_t yy_nxt;
        };
static const struct yy_trans_info *yy_start_state_list[3] =
    {
    &yy_transition[1],
    &yy_transition[3],
    &yy_transition[24],
    } ; 

if ( ! (yy_start) )
   (yy_start) = 1; /* first start state */

while(1) {

  yy_current_state = yy_start_state_list[(yy_start)];
yy_match:
  {
     const struct yy_trans_info *yy_trans_info;
     YY_CHAR yy_c;

     for ( yy_c = YY_SC_TO_UI(*yy_cp);
             (yy_trans_info = &yy_current_state[yy_c])->yy_verify == yy_c;
             yy_c = YY_SC_TO_UI(*++yy_cp) )
      {
         yy_current_state += yy_trans_info->yy_nxt;
         if ( yy_current_state[-1].yy_nxt )
         {
            (yy_last_accepting_state) = yy_current_state;
            (yy_last_accepting_cpos) = yy_cp;
          }
      }
yy_find_action:
      yy_act = yy_current_state[-1].yy_nxt;
do_action:
      switch ( yy_act )
       { /* beginning of action switch */
           case 0: /* must back up */
           /* undo the effects of YY_DO_BEFORE_ACTION */
           *yy_cp = (yy_hold_char);
           yy_cp = (yy_last_accepting_cpos) + 1;
           yy_current_state = (yy_last_accepting_state);
           goto yy_find_action;
 

вторник, 2 июня 2026 г.

RE of PTX grammar from ptxas, part 3

Parts 1 & 2

Pseudo instructions

Surprise-surprise - some PTX instructions not mapped directly to underlying SASS 1:1. Instead they generate lots of another PTX code. I already extracted their decrypted bodies, so it's time to describe how they connected to specific PTX pseudo instructions
 
There is function somewhere deep inside ptxas which register lots of handlers for dumping real PTX for pseudo instructions. Code for registration of single item looks like
  mov     rdi, [rbx+250h] ; dictionary of pseudo-instructions
  lea     rdx, emit_multimem_ld_reduce ; handler
  lea     rsi, aMultimemLdRedu         ; "multimem.ld_reduce" - pseudo instruction name
  call    reg_sm_cb

There are 587 such handlers - although 473 have strange names like "1030557441". I don't know what they mean - highly likely that this is product of another encryption somewhere inside parser - at least each such string has exactly 1 reference
Lets look inside some handler
  call    get_pool
  mov     rdi, [rax+18h]
  mov     esi, 0C350h ; 50000₁bytes - they don't skimp on matches
  call    alloc_buf
  test    rax, rax
  mov     r12, rax ; r12 holds address of string buffer
  jz      loc_5626FC1E9D78 ; die in alloc_failed
loc_5626FC1E9733:            ; CODE XREF: emit_multimem_ld_reduce+67D↓j
  lea     rdx, [r13+1A5E95h] ; whut ?
  lea     rsi, aS_11         ; "%s"
  mov     rdi, r12           ; s
  xor     eax, eax
  call    _sprintf ; note that even not snprintf - security above all!
  lea     rdx, [r13+1A5E98h] ; whut again ?
  movsxd  rdi, eax ; store in rdi length of written string
  lea     rsi, aS_11         ; "%s"
  mov     rbx, rdi
  xor     eax, eax
  add     rdi, r12           ; s
  call    _sprintf 
 
Debugger showed that R13 holds address of those decrypted string pool in memory. 
Just assess the level of paranoia - there is huge encrypted blob with strings 1.8Mb. Then they wrote 587 functions where each string from those blob can be used only by offset - 21042 unique offsets! Nvidia definitely didn't want us to see its dirty secrets.
 
So I wrote some code to extract all emitters, then all string offsets - see result. Now it would be good to link offsets from each emitter with real string, right?
 
Nothing is simpler - yet another Perl XS module to load memory mapped file + small perl script - and finally we can see this

Lexer brute-force