Pseudo instructions
Surprise-surprise - some PTX instructions not mapped directly to underlying SASS 1:1. Instead they generate lots of another PTX code. I already extracted their decrypted bodies, so it's time to describe how they connected to specific PTX pseudo instructions
There is function somewhere deep inside ptxas which register lots of handlers for dumping real PTX for pseudo instructions. Code for registration of single item looks like
mov rdi, [rbx+250h] ; dictionary of pseudo-instructions
lea rdx, emit_multimem_ld_reduce ; handler
lea rsi, aMultimemLdRedu ; "multimem.ld_reduce" - pseudo instruction name
call reg_sm_cb
There are 587 such handlers - although 473 have strange names like "1030557441". I don't know what they mean - highly likely that this is product of another encryption somewhere inside parser - at least each such string has exactly 1 reference
Lets look inside some handler
call get_pool
mov rdi, [rax+18h]
mov esi, 0C350h ; 50000₁bytes - they don't skimp on matches
call alloc_buf
test rax, rax
mov r12, rax ; r12 holds address of string buffer
jz loc_5626FC1E9D78 ; die in alloc_failed
loc_5626FC1E9733: ; CODE XREF: emit_multimem_ld_reduce+67D↓j
lea rdx, [r13+1A5E95h] ; whut ?
lea rsi, aS_11 ; "%s"
mov rdi, r12 ; s
xor eax, eax
call _sprintf ; note that even not snprintf - security above all!
lea rdx, [r13+1A5E98h] ; whut again ?
movsxd rdi, eax ; store in rdi length of written string
lea rsi, aS_11 ; "%s"
mov rbx, rdi
xor eax, eax
add rdi, r12 ; s
call _sprintf
Debugger showed that R13 holds address of those decrypted string pool in memory.
Just assess the level of paranoia - there is huge encrypted blob with strings 1.8Mb. Then they wrote 587 functions where each string from those blob can be used only by offset - 21042 unique offsets! Nvidia definitely didn't want us to see its dirty secrets.
So I wrote some code to extract all emitters, then all string offsets - see result. Now it would be good to link offsets from each emitter with real string, right?
Nothing is simpler - yet another Perl XS module to load memory mapped file + small perl script - and finally we can see this