Previous part
I've noticed that almost all real API functions has the same prologues like:
mov eax, cs:dword_5E14C00 ; unique for each API function
mov [rbp+var_D0], 3E7h
mov [rbp+var_C0], 0
mov [rbp+var_C8], 0
test eax, eax
jz short loc_39603B
lea rdi, [rbp+var_C0]
call sub_2EE190 ; get data from pthread_getspecific
test eax, eax
jz loc_396118
loc_396118:
lea rbx, aCustreamupdate_5 ; "cuStreamUpdateCaptureDependencies_ptsz"
mov [rbp+var_88], rdx
call call_dbg
So I extracted from cudbgApiDetach those dbg_callback and array of debug tracepoints - see method try_dbg_flag. I don't know why debugger needs them -probably this is part of events tracing
When you run your program under cuda-gdb this callback will be set:
api_gate at 0x155554e11940 (155552A2CB50) - /lib/x86_64-linux-gnu/libcudadebugger.so.1