четверг, 15 января 2026 г.

libcuda.so internals part 2

Previous part

I've noticed that almost all real API functions has the same prologues like:

    mov     eax, cs:dword_5E14C00 ; unique for each API function
    mov     [rbp+var_D0], 3E7h
    mov     [rbp+var_C0], 0
    mov     [rbp+var_C8], 0
    test    eax, eax
    jz      short loc_39603B
    lea     rdi, [rbp+var_C0]
    call    sub_2EE190 ; get data from pthread_getspecific
    test    eax, eax
    jz      loc_396118

 loc_396118:

    lea     rbx, aCustreamupdate_5  ; "cuStreamUpdateCaptureDependencies_ptsz"
    mov     [rbp+var_88], rdx
    call    call_dbg

So I extracted from cudbgApiDetach those dbg_callback and array of debug tracepoints - see method try_dbg_flag. I don't know why debugger needs them -probably this is part of events tracing

When you run your program under cuda-gdb this callback will be set:

api_gate at 0x155554e11940 (155552A2CB50) - /lib/x86_64-linux-gnu/libcudadebugger.so.1

Curiously that under NSight this debug machinery is not used. Traditional joke:
man nsight-sys
No manual entry for nsight-sys
 
So for now I have exotic way to detect if cuda program is under debugger - actually this can be checked with exported cudbgDebuggerInitialized
Also trace of exported functions can be produced by ordinary ltrace, so what is the point?
 
Well, luckily those tracepoints are grouped in several arrays:
flags_sztab 5A87640 size 31
 [1] 8 5E15160
 [2] 5A 5E14FE0
 [3] 12 5E14F80
 [4] 5 5E14F50
 [5] 5 5E14F30
 [6] 341 5E14220 ; public API tracepoints
 [7] 400 5E13220
 [8] D 5E131E0
 [9] 4 5E131C0
 [10] 3 5E131B0
 [11] 4 5E131A0
 [12] 8 5E13180
 [13] B 5E13140
 [14] 3 5E13118
 [15] E 5E130E0
 [16] 7 5E130B0
 [17] 5 5E13090
 [18] 9 5E13060
 [19] 1C 5E12FE0
 [20] 6 5E12FC0
 [21] 16 5E12F60
 [22] 4 5E12F40
 [23] 2 5E12F38
 [24] D 5E12F00
 [25] 4 5E12EE0
 [26] 4 5E12ED0
 [27] A 5E12EA0
 [28] 7 5E12E80
 [29] 5 5E12E60
 [30] 2 5E12E50

here second column is size and third is pointer to tracepoints array. As you can see there is much more internal tracepoints and I hope enabling/logging on them will greatly helps with further libcuda RE 

Memory RT functions

seems that in previous part I extracted not full RT functions set - memory functions like memcpy128/memcpyDtoD2D/memset32 were missed. So I extracted them into separate archive

Комментариев нет:

Отправить комментарий