среда, 1 октября 2025 г.

addresses of cuda kernel functions

 Quote from official document:

It is not allowed to take the address of a __device__ function in host code

I haven't been surprised for a long time that entire CUDA is made up of ridiculous restrictions. What if I told you that paranoid nvidia lies as usually and actually you can get addresses of kernel functions in your host code?

But first lets check what workarounds we can employ to have functions pointers. I don't know for what pedagogical purpose this code intentionally was written so poorly and does not free the allocated memory - and now millions of brainless artificial idiots will copy-paste it forever, so I made patched version. You can realize that attempt to read from early gathered with cudaMemcpyFromSymbol addresses will results error 1 (invalid argument)

Ok. but we could just return address of function directly from another kernel function, right? So I made quick & dirty hack
I brute-forced all combinations of cf1(__device__/__constant__) & variants of cudaMemcpyFromSymbol/cudaMemcpy - and with no luck
So it's time to run

cuda-gdb

 
what we want to observe in debugger is registers R18 & R19 after instructions loading address of function calc_hash
 /*0080*/ MOV R18, 32@lo(_Z9calc_hashPKcPi) ;
 /*0090*/ MOV R19, 32@hi(_Z9calc_hashPKcPi) ;

And first unpleasant surprise - gdb disasm shows them as
 MOV R18, 0
 MOV R19, 0

It seems that it just can't show instructions with relocs, for example we can see the same picture on loading of format string $str:
 MOV R4, 0
 MOV R5, 0

instead of
 /*00b0*/ MOV R4, 32@lo($str) ;
 /*00c0*/ MOV R5, 32@hi($str) ;
 
quick check with info registers $RXX shows that R18 & R19 are zero while R4 & R5 contains some non-zero address. Wtf - they both contains relocs to some symbols according to output of nvdisasm?

nvdisasm lies

Yep, again
I run my nvd to see what relocs we have in reality:
[9] .rel.text._Z10dirty_hackPPcS0_ type 9 flags 0
7 relocs:
 [7] 20 sym 5 cf1 R_CUDA_ABS32_LO_32
 [7] 40 sym 5 cf1 R_CUDA_ABS32_HI_32
 [7] 80 sym 15 _Z9calc_hashPKcPi R_CUDA_FUNC_DESC32_LO_32
 [7] 90 sym 15 _Z9calc_hashPKcPi R_CUDA_FUNC_DESC32_HI_32
 [7] B0 sym 6 $str R_CUDA_ABS32_LO_32
 [7] C0 sym 6 $str R_CUDA_ABS32_HI_32
 [7] 120 sym 16 vprintf R_CUDA_ABS47_34
Can you spot the difference? Yep, format string $str has relocs of R_CUDA_ABS_XXX, while loading address of function - R_CUDA_FUNC_DESC32_XXX. Official nvdisasm just can't show you it

real solution 

the next step I thought that there was no actual difference between symbols $str & calc_hash - they both located in separate sections, so I just patched reloc types for instructions 80 & 90:
[9] .rel.text._Z10dirty_hackPPcS0_ type 9 flags 0
7 relocs:
 [7] 20 sym 5 cf1 R_CUDA_ABS32_LO_32
 [7] 40 sym 5 cf1 R_CUDA_ABS32_HI_32
 [7] 80 sym 15 _Z9calc_hashPKcPi R_CUDA_ABS32_LO_32
 [7] 90 sym 15 _Z9calc_hashPKcPi R_CUDA_ABS32_HI_32
 [7] B0 sym 6 $str R_CUDA_ABS32_LO_32
 [7] C0 sym 6 $str R_CUDA_ABS32_HI_32
 [7] 120 sym 16 vprintf R_CUDA_ABS47_34

 
And yes - it worked after that:
f_addr: 0x7f8235a00100 (nil)
cf1 (nil) what 0x7f8235a00100 value (nil) calc_hash 0x7f8237e32d00
f_addr2: 0x7f8235a00100 0x7f8237e32d00
In any unclear situation just always patch cubin!
 
Next logical question - what tool is responsible for this information hiding? I think it's notorious ptxas but too lazy to proof it
 
Happy hacking!

Комментариев нет:

Отправить комментарий