I've spent last week trying to solve deceptively simple problem - what values ptxas put into c[0x0][XX] for access to kernel parameters like %nctaid. Unfortunately I was unable to find inside ptxas some nice tables for pile of SM, so as usually made some brute-force
Names of params are documented in official
doc - seems that this time ptxas can't add something to this list
Brute-forcing
Next just write in plain PTX fake function trash with u32 return value - something like
.visible .func (.param .u32 func_retval0) trash
{
.reg .u32 %r<3>;
mov.u32 %r0, %gridid;
mov.u32 %r1, %nctaid.x;
add.u32 %r0, %r0, %r1;
st.param.u32 [func_retval0+0], %r0;
ret;
}
final st.param is very important bcs otherwise compiler will just eliminate whole code. Instead of gridid & nctaid.x we can substitute pair of special registers, compile with ptxas to specific SM and then parse output of nvdisasm/nvd/whatever can disasm SASS
Surprisingly, this stupid method worked very well, however there are holes in params. So it's time to check
CUDA runtime
I
extracted them in december and now we can parse output of nvdisasm to find not identified yet offsets. The funny part is that official nvdisasm failed on several files, like sm54.elf
nvdisasm error : Could not establish the target of this branch operation
or on sm23.elf
nvdisasm error : Wrong Anti dependency order in function 'vfprintf_internal'
nvdisasm . @P1 LD.E.CG.64 R14, [R4], P0
nvdisasm . -- Anti(PRED,0),0*,0 -->
nvdisasm . @!P1 LEA.HI.X P0, R7, R12, RZ, R13
So I was forced to use my own
nvdIt turned out that the parameter space is divided into two parts - there are block of parameters at offset 0x1860 (holding for example starting PC of kernel) used by kernel launch logic and CnpXXX functions
So now we know lots of offsets and their sizes. However to identify semantics of many found offsets we need
debugger
cuda-gdb rushes to the rescue
I made fake PTX for each SM, patched it with my
ced and inspected in debugger values with command
$_cuda_const_bank(0, offset). Actually this was the most boring part of work and I still didn't recognized some fields. Also I don't have expensive monsters like sm100+ so I extracted only params from maxler till hopper
Results
I also add this code to my XS perl
module and
nvd, so output looks like
/*58*/ XMAD R02,R17,c[0][0x8],R02 ?trans1;
; cb0 param %ntid_x
Names starting with '%' were extracted with just disasm of fake trash function
Happy hacking!