четверг, 12 февраля 2026 г.

libcudadebugger.so logger

I've done some research of libcudadebugger.so internals - seems that it has exactly the same patterns:

  • functions table returned by GetCUDADebuggerAPI located in .data section so you can patch any callback address
  • and each API function has logger

This last fact is strange - while loggers from libcuda.so were used by debugger then who consume logs from debugger itself? Check code to load those loggers:

  lea     rdi, aNvtxInjection6          ; "NVTX_INJECTION64_PATH"
  call    _getenv
  mov     rdi, rax                      ; file
  test    rax, rax
  jz      short loc_14B160
  mov     esi, 1                        ; mode
  call    _dlopen
  mov     r13, rax
  test    rax, rax
  jz      short loc_14B190
  lea     rsi, aInitializeinje_1        ; "InitializeInjectionNvtx2"
  mov     rdi, rax                      ; handle
  call    _dlsym
  test    rax, rax
  jz      short loc_14B1A0
  lea     rdi, sub_14A270
  call    rax 
Very straightforward - load shared library from env var NVTX_INJECTION64_PATH and call function InitializeInjectionNvtx2 - part of Cupti API. Btw excellent injection hook
 
Unfortunately these loggers don't collect parameters of API functions - only their names in packets with fixed size 0x30 bytes:
  lea     rax, aFailedCreatede+7        ; "CreateDebuggerSession"
  mov     [rbp+var_18], rax
  mov     rax, cs:dbg_log
  mov     [rbp+var_20], 0
  mov     dword ptr [rbp+var_40], 300003h
  mov     dword ptr [rbp+var_20], 1
  movaps  [rbp+var_30], xmm0
  test    rax, rax
  jz      loc_1470AC
  lea     rdx, [rbp+var_40]
  mov     r12, rdx
  mov     rdi, rdx
  call    rax
Name of called function located at offset 0x28 and in logs looks like

воскресенье, 8 февраля 2026 г.

building cuda-gdb from sources

For some reason cuda-gdb from cuda sdk gives on my machine list of errors like

Traceback (most recent call last):
  File "/usr/share/gdb/python/gdb/__init__.py", line 169, in _auto_load_packages
    __import__(modname)
  File "/usr/share/gdb/python/gdb/command/explore.py", line 746, in <module>
    Explorer.init_env()
  File "/usr/share/gdb/python/gdb/command/explore.py", line 135, in init_env
    gdb.TYPE_CODE_RVALUE_REF : ReferenceExplorer,
AttributeError: 'module' object has no attribute 'TYPE_CODE_RVALUE_REF'

so I decided rebuild it with python version installed in system - and this turned out to be a difficult task

The first question is where the source code? Seems that official repository does not contain cuda specific code - so raison d'être of these repo is totally unclear. I extracted from cuda sdk .deb archive cuda-gdb-13.1.68.src.tar.gz and proceed with it

Second - process of configuring is extremely fragile - if you point single wrong option you will know about it only after 30-40 min. Also it seems that you just can't run configure in sub-dirs, bcs in that case linker will claims about tons of missed symbols. So configuration found by trial and error
configure --with-python=/usr/bin/python3 --enable-cuda

And finally we got file gdb/gdb having size 190 Mb. And after running I got stack trace beginning with
arch-utils.c:1374: internal-error: gdbarch: Attempt to register unknown architecture (2)

This all raises some questions for nvidia:

  • do they testing their cuda sdk before releasing?
  • do they have QA at all or like microsoft just test their ai shit directly on users?
  • from which sources was built original cuda-gdb in fact? 

Well, at least having some suspicious source code we can fix this build