The first question that comes to mind when looking at them is "why they are so huge?". For example libcuda.so from cuda 10.1 has size 28Mb and from 13.1 already 96Mb. So I rejected the idea that they are just yet another victims of vibe-coding and made some preliminary RE. The answer is - because they contain in .rodata section lots of CUBIN files for
kernel run-time
/*0160*/ LEPC R20 ; R20 now holds 170
/*0170*/ IADD3 R20, P0, R20, 0x50, RZ 1 ; and if P0 R20 += 0x50 - syscalls like __cuda_syscall_cp_async_bulk_tensor_XX, __cuda_syscall_tex_grad_XX etc
- implementation of functions like cudaGraphLaunch/vprintf
- functions cnpXXX like cnpDeviceGetAttribute
- logic for kernel enqueue
- some support for profiling like scProfileBuffers
- trap handlers
API callbacks
public cuMemAdvise
cuMemAdvise proc near
cmp cs:finited, 321CBA00h
jz short loc_2EC348
jmp cs:off_1B105E8
loc_2EC348: ; CODE XREF: cuMemAdvise+A↑j
mov eax, 4 ; CUDA_ERROR_DEINITIALIZED
retn
As you can see they have jump to address located in .data section. I don't know for what this was done but we can reuse this indirection for our own dirty purposes - like patch them to trace some specific CUDA API (instead of ancient trick with LD_PRELOAD). So I made FSM to extract them
Source. test program tries to disasm libcuda.so and dump all found callbacks
Happy hacking!
Комментариев нет:
Отправить комментарий