вторник, 2 июня 2026 г.

RE of PTX grammar from ptxas, part 3

Parts 1 & 2

Pseudo instructions

Surprise-surprise - some PTX instructions not mapped directly to underlying SASS 1:1. Instead they generate lots of another PTX code. I already extracted their decrypted bodies, so it's time to describe how they connected to specific PTX pseudo instructions
 
There is function somewhere deep inside ptxas which register lots of handlers for dumping real PTX for pseudo instructions. Code for registration of single item looks like
  mov     rdi, [rbx+250h] ; dictionary of pseudo-instructions
  lea     rdx, emit_multimem_ld_reduce ; handler
  lea     rsi, aMultimemLdRedu         ; "multimem.ld_reduce" - pseudo instruction name
  call    reg_sm_cb

There are 587 such handlers - although 473 have strange names like "1030557441". I don't know what they mean - highly likely that this is product of another encryption somewhere inside parser - at least each such string has exactly 1 reference
Lets look inside some handler
  call    get_pool
  mov     rdi, [rax+18h]
  mov     esi, 0C350h ; 50000₁bytes - they don't skimp on matches
  call    alloc_buf
  test    rax, rax
  mov     r12, rax ; r12 holds address of string buffer
  jz      loc_5626FC1E9D78 ; die in alloc_failed
loc_5626FC1E9733:            ; CODE XREF: emit_multimem_ld_reduce+67D↓j
  lea     rdx, [r13+1A5E95h] ; whut ?
  lea     rsi, aS_11         ; "%s"
  mov     rdi, r12           ; s
  xor     eax, eax
  call    _sprintf ; note that even not snprintf - security above all!
  lea     rdx, [r13+1A5E98h] ; whut again ?
  movsxd  rdi, eax ; store in rdi length of written string
  lea     rsi, aS_11         ; "%s"
  mov     rbx, rdi
  xor     eax, eax
  add     rdi, r12           ; s
  call    _sprintf 
 
Debugger showed that R13 holds address of those decrypted string pool in memory. 
Just assess the level of paranoia - there is huge encrypted blob with strings 1.8Mb. Then they wrote 587 functions where each string from those blob can be used only by offset - 21042 unique offsets! Nvidia definitely didn't want us to see its dirty secrets.
 
So I wrote some code to extract all emitters, then all string offsets - see result. Now it would be good to link offsets from each emitter with real string, right?
 
Nothing is simpler - yet another Perl XS module to load memory mapped file + small perl script - and finally we can see this

Lexer brute-force