It's funny how paranoid nvidia trying to hide as much info from their customers as it can. One sample is so called "special registers" - even with PTX you can extract only limited set of them
So I played a bit with some undocumented SRs - namely with SR_MACHINE_ID_XX & SR_REGALLOC. I made legal loading of special registers and then patched those SASS instructions with my Ced
Lets see how those code looks in
PTX
      mov.u32         %r2, %tid.x;
      st.global.u32   [%rd2], %r2;
      mov.u32         %r3, %tid.y;
      st.global.u32   [%rd2+4], %r3;
      mov.u32         %r4, %tid.z;
      st.global.u32   [%rd2+8], %r4;
      // inline asm
      mov.u32 %r1, %smid;
      // inline asm
      st.global.u32   [%rd2+12], %r1;
      mov.u32         %r5, 21;
      st.global.u32   [%rd2+16], %r5;As you can see - nothing special, just load in r2-r5 some values and store them in r12 holding address of function argument (obtained with cvta.to.global.u64)
