I've spent last week trying to solve deceptively simple problem - what values ptxas put into c[0x0][XX] for access to kernel parameters like %nctaid. Unfortunately I was unable to find inside ptxas some nice tables for pile of SM, so as usually made some brute-force
Names of params are documented in official doc - seems that this time ptxas can't add something to this list
Brute-forcing
Next just write in plain PTX fake function trash with u32 return value - something like
.visible .func (.param .u32 func_retval0) trash
{
.reg .u32 %r<3>;
mov.u32 %r0, %gridid;
mov.u32 %r1, %nctaid.x;
add.u32 %r0, %r0, %r1;
st.param.u32 [func_retval0+0], %r0;
ret;
}
final st.param is very important bcs otherwise compiler will just eliminate whole code. Instead of gridid & nctaid.x we can substitute pair of special registers, compile with ptxas to specific SM and then parse output of nvdisasm/nvd/whatever can disasm SASSSurprisingly, this stupid method worked very well, however there are holes in params. So it's time to check
CUDA runtime
I extracted them in december and now we can parse output of nvdisasm to find not identified yet offsets. The funny part is that official nvdisasm failed on several files, like sm54.elf
nvdisasm error : Could not establish the target of this branch operationor on sm23.elf
nvdisasm error : Wrong Anti dependency order in function 'vfprintf_internal'
nvdisasm . @P1 LD.E.CG.64 R14, [R4], P0
nvdisasm . -- Anti(PRED,0),0*,0 -->
nvdisasm . @!P1 LEA.HI.X P0, R7, R12, RZ, R13 So I was forced to use my own nvd
It turned out that the parameter space is divided into two parts - there are block of parameters at offset 0x1860 (holding for example starting PC of kernel) used by kernel launch logic and CnpXXX functions
So now we know lots of offsets and their sizes. However to identify semantics of many found offsets we need debugger
cuda-gdb rushes to the rescue
I made fake PTX for each SM, patched it with my ced and inspected in debugger values with command $_cuda_const_bank(0, offset). Actually this was the most boring part of work and I still didn't recognized some fields. Also I don't have expensive monsters like sm100+ so I extracted only params from maxler till hopper
Results
/*58*/ XMAD R02,R17,c[0][0x8],R02 ?trans1;
; cb0 param %ntid_x
Names starting with '%' were extracted with just disasm of fake trash function
Happy hacking!
Комментариев нет:
Отправить комментарий