пятница, 8 марта 2024 г.

Profiling shared libraries on linux

Disclaimer: proposed approach uses dirty hacks & patches and tested on x86_64 only so use it at your own risk. Also no chatGPT or some another Artificial Idiots were used for this research

Lets assume that we have shared library (for example R extension or python module) and we want to know where and why it spending many hours and consuming megawatts of electricity. There is even semi-official way to do this:

  1. compile shared library with -pg option
  2. set envvar LD_PROFILE_OUTPUT to directory where you want to store profiling data
  3. set envvar LD_PROFILE to filename of library to profile
  4. run your program. Well, sounds that you need lots of things to do before this step and you can`t set up profiling dynamically
  5. run sprof on profiling log

Unfortunately this method just don`t work - sprof fails with cryptic message
Inconsistency detected by ld.so: dl-open.c: 890: _dl_open: Assertion `_dl_debug_initialize (0, args.nsid)->r_state == RT_CONSISTENT' failed!

Seems that this long lived bug known since 2017 and still not fixed
Lets try to discover some more reliable way and start with inspection of code generated for profiling
 

gcc -pg

You can see in disasm that almost every function after prologue has call to _mcount. It located in GOT and thus avoiding PLT black magic. Interesting to note that it has no args - prototype is just void mcount(void) and this leads to two consequences
  1. it extracts return address from stack
  2. and so cannot be called as ordinary C function - you need to add asm thunk like "jmp [mcount_ptr]" in your wrapper
 
Besides ELF binaries have yet couple of profile related calls in __gmon_start__ function - __monstartup and mcleanup registered with atexit. 
 
Prototype for monstartup:
void monstartup (char *low_pc, char *high_pc)
It called to start profiling data and all mcount calls must be located within range low_pc .. high_pc
 
Function mcleanup write profiler logs to disc and has prototype
void _mcleanup (void)
 
Shared libraries have only mcount calls and this is why you can`t start profiling right after loading of shared library - you need some code to start/stop profiling

It so happened that -pg is not only option for profiling - there is second:

gcc -finstrument-functions

gcc has in FUNCTION_DECL attribute DECL_NO_INSTRUMENT_FUNCTION_ENTRY_EXIT
to indicate that function entry and exit should be instrumented with calls to support routines
This "support routines" are

void __cyg_profile_func_enter(void *this_fn, void *call_site) 

void __cyg_profile_func_exit(void *this_fn, void *call_site) 

Seems that you even can write your own profiler from scratch! Important to note that at least on x86_64 ABI you can patch GOT pointer __cyg_profile_func_enter to mcount - bcs first 2 arguments passed in registers

 

making API for profiling

So main idea is to write some code to 
  1. check if desired shared library can be profiled (e.q. has mcount or __cyg_profile_func_enter in GOT) - I used ELFIO but perhaps this can be done with LIEF too
  2. patch GOT
  3. find addresses of loaded shared llibrary - this can be done with dl_iterate_phdr
  4. call monstartup to start profiling
  5. run your code from profiling library
  6. call mcleanup to stop profiling and store results
Surprisingly now we have file gmon.out! Lets run 
gprof path2your_shared_library gmon.out

and see sad results - gprof refusing to show results. This is bcs it don`t know at which address your shared library was loaded (and worse - this address will differs each run due to ASLR). So lets

patch gprof

It sounds scary but in reality the most difficult thing was to find not used letter in getopts. I chose -U to pass image base and subtracting passed value in function gmon_io_read_vma. Patch

Because now you also need to know base address for gprof I just add it to name of gmon.out - via setting GMON_OUT_PREFIX. Run patched gprof:

 gprof -b -U 7F9E01224000 ./libprelf.so gmon.7F9E01224000.3926320

Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  0.00      0.00     0.00    12552     0.00     0.00  ELFIO::endianess_convertor::operator()(unsigned long) const
  0.00      0.00     0.00     4608     0.00     0.00  ELFIO::section_impl<ELFIO::Elf64_Shdr>::get_size() const
  0.00      0.00     0.00     4398     0.00     0.00  ELFIO::section_impl<ELFIO::Elf64_Shdr>::get_entry_size() const
  0.00      0.00     0.00     2963     0.00     0.00  ELFIO::endianess_convertor::operator()(unsigned int) const
  0.00      0.00     0.00     2935     0.00     0.00  ELFIO::section_impl<ELFIO::Elf64_Shdr>::get_data() const

...

Комментариев нет:

Отправить комментарий