четверг, 30 марта 2023 г.

dwarfdump

I made pale analog of world famous pdbdump to dump types and functions from DWARF. Before introducing my tool I have several words about DWARF - it is excess, compiler-specific, inconsistent and dangerous

Redudancy

gcc and llvm put every used types set in each compilation unit. This is really terrible if you use lots of templates like STL/boost - you will have duplicated declarations of std::map, std::string etc. Yep, this is main reason why stripped binaries becomes much smaller:

ls -l llvm-dwarfdump llvm-dwarfdump.stripped

-rwxrwxr-x 1 redp redp 471241104 mar 29 00:52 llvm-dwarfdump
-rwxrwxr-x 1 redp redp 22170696  mar 29 17:49
llvm-dwarfdump.stripped

Another example - lets check how many times function console_printk declared in debug info from linux kernel:
grep console_printk vm.g | wc -l
2883

It is the same function declared in file include/linux/printk.h line 65 column 0xc - why linker can`t merge it`s type producing debug output?
 
Golang tries to fix this problem using types declarations once and then referring to them from another units (and at the same time compressing debug sections with zlib) - this is very ironically bcs anyway binaries on go typically have size in several Mb (btw llvm-dwarfdump cannot process compressed sections)

 

compiler-specific 

This is pretty obvious - each programming language has some unique features and DWARF must deal with all of them
But just look at this:
 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <c>   DW_AT_name        : internal/cpu
    <19>   DW_AT_language    : 22       (Go)
    <1a>   DW_AT_stmt_list   : 0x0
    <1e>   DW_AT_low_pc      : 0x401000
    <26>   DW_AT_ranges      : 0x0
    <2a>   DW_AT_comp_dir    : .
    <2c>   DW_AT_producer    : Go cmd/compile go1.13.8
    <44>   Unknown AT value: 2905: cpu

I was unable to find in golang sources meaning of this custom attributes

 

Inconsistency

DWARF specification don`t define lots of important things. Just to name few:
  • order of tags, so you can have mix of formal parameters with types at the same nesting level
  • which attributes are mandatory for tags - I saw lots of missed DW_AT_sibling for example
  • when locations info should be placed in separate section .debug_loc - seems that this happens for inlined subroutines only
  • encoding of addresses. You have DW_AT_low_pc for functions address. But also there is DW_AT_abstract_origin (and DW_AT_specification). The same function can have different addresses even in plain C via this attributes: 
     <1><191cde>: Abbrev Number: 194 (DW_TAG_subprogram)
        <191ce0>   DW_AT_external    : 1
        <191ce0>   DW_AT_name        : (indirect string, offset: 0x24d2f): perf_events_lapic_init
        <191ce4>   DW_AT_decl_file   : 1
        <191ce5>   DW_AT_decl_line   : 1719
        <191ce7>   DW_AT_decl_column : 6
        <191ce8>   DW_AT_prototyped  : 1
        <191ce8>   DW_AT_inline      : 1    (inlined)
     <1><19a945>: Abbrev Number: 96 (DW_TAG_subprogram)
        <19a946>   DW_AT_abstract_origin: <0x191cde>
        <19a94a>   DW_AT_low_pc      : 0xffffffff81004dc0
     <1><19b3c7>: Abbrev Number: 96 (DW_TAG_subprogram)
        <19b3c8>   DW_AT_abstract_origin: <0x191cde>
        <19b3cc>   DW_AT_low_pc      : 0xffffffff81007930


 All of this lead us to conclusion that DWARF is just

Dangerous

True ant-debugging trick - what if attribute DW_AT_type for DW_TAG_pointer_type points to the same tag? How about negative offset in DW_AT_sibling? I believe that this is very reach area for fuzzing

 

Features of dwarfdump

dwarfdump can parse little-endian 32 or 64 bit ELF files and supports compressed sections (from golang and SHF_COMPRESSED with zlib)
It can dump types (like structures, unions, classes, enums etc), functions, methods (including vtbl index) and vars
Where possible it can show addresses of functions, methods (they can have several addresses - for example per each specialization of template) and variables
Also it can show location of formal parameters like their offset in stack or in which register they are passed
dwasrfdump has two output format:
  • JSON
  • plain C and some subset of C++. This output my looks strange for other languages like Go. I am too lazy to develop renderers for other languages

 

What is not supported

Sections with .dwo suffix
 
Inlined functions (tag DW_TAG_inlined_subroutine)
 
Local variables in functions - they located inside lexical blocks together with local types. This local types can be included in output with -L option (but not local vars)
 
C++ templates specialization - it seems that C++ compiler anyway include this information in mangled names so this is not big loss
 
Sure tags and attributes for languages other than C/C++ - like DW_TAG_with_stmt, DW_TAG_variant_part (used in Ada &Rust) or DW_TAG_dwarf_procedure (I don`t even have ideas for what it was added to DWARF spec) etc
 
There are probably many more unsupported things I don't even know about

 

command-line options 

-d - dump lots of useless debug output
-f - include functions. if omitted only types will be dumped
-g - due to golang reuse types located in any compilation unit output will be produced only after parsing of whole debug info
-j - produce JSON. if omitted plain c/c++ rendered will be used
-k - keep dumped type - bcs in one module type A can be just declared (but with constructor/destructor for example) and it`s members can be defined in some other module
-l - add nesting level to JSON output for each type
-L - process lexical blocks. This is significantly slows down processing time 
-o <output filename>. if omitted stdout will be used
-v - verbose mode
-V - include global variables

 

performance

Timings of processing biggest module which I was able to find on my Ubuntu - libLTO.so from fresh llvm (3.5Gb!):
ls -l libLTO.so.17git
-rwxrwxr-x 1 redp redp 3519267480 mar 28 23:47 libLTO.so.17git

time objdump -g -Wi ./libLTO.so.17git | tail
real    10m50,079s
user    10m29,327s
sys    0m52,881s

time llvm-dwarfdump --debug-info libLTO.so.17git | tail
real    8m13,707s
user    8m10,424s
sys    0m36,764s

time ../dumper -v -V -f -k -L libLTO.so.17git | tail
real    1m11,879s
user    0m48,765s
sys    0m5,249s

dwarfdump outpeforms objdump & llvm-dwarfdump bcs it parses only necessary sections of debug info and just skip unsupported tags heavily using DW_AT_sibling attribute. Given that almost every function has one or more lexical blocks this reduces time of processing at least twice

 

example of output

Good sample of structure with lots of anonymous nested types - restart_block 

struct restart_block {
// Offset 0x0
long unsigned int arch_data;
// Offset 0x8
long int (*fn)(struct restart_block*);
// Offset 0x10
union {
    // Offset 0x0
    struct {
        // Offset 0x0
        u32* uaddr;
        // Offset 0x8
        u32 val;
        // Offset 0xC
        u32 flags;
        // Offset 0x10
        u32 bitset;
        // Offset 0x18
        u64 time;
        // Offset 0x20
        u32* uaddr2;
      } futex;
    // Offset 0x0
    struct {
        // Offset 0x0
        clockid_t clockid;
        // Offset 0x4
        enum timespec_type type;
        // Offset 0x8
        union {
            // Offset 0x0
            struct __kernel_timespec* rmtp;
            // Offset 0x0
            struct old_timespec32* compat_rmtp;
          };
        // Offset 0x10
        u64 expires;
      } nanosleep;
    // Offset 0x0
    struct {
        // Offset 0x0
        struct pollfd* ufds;
        // Offset 0x8
        int nfds;
        // Offset 0xC
        int has_timeout;
        // Offset 0x10
        long unsigned int tv_sec;
        // Offset 0x18
        long unsigned int tv_nsec;
      } poll;
  };
};

Function with strange calling convention from linux kernel:
// Addr 0xFFFFFFFF810082F0
// TypeId 1A408F
// kobj id 1A40B1: OP_reg rdi
// attr id 1A40BF: OP_reg rsi
// i id 1A40CD: OP_reg rdx
umode_t not_visible(struct kobject* kobj,struct attribute* attr,int i);

Class tableRegNames:
// Size 0x18
struct tableRegNames {
// Offset 0x8
const const char** tab_;
// Offset 0x10
size_t tab_size;
// --- methods
 tableRegNames(struct tableRegNames* this,struct tableRegNames&&);
 tableRegNames(struct tableRegNames* this,const struct tableRegNames&);
// specification
//  addr AC42 type_id 39AE5 _ZN13tableRegNamesC2EPKPKcm
 tableRegNames(struct tableRegNames* this,const const char**,size_t);
// Vtbl index 2
// specification
//  addr AC90 type_id 39A78
virtual const char* reg_name(struct tableRegNames* this,unsigned int);
// specifications: 2
//  addr AD2A type_id 3998E _ZN13tableRegNamesD0Ev
//  addr ACFC type_id 399BA _ZN13tableRegNamesD2Ev
virtual  ~tableRegNames(struct tableRegNames* this,int);
};

Комментариев нет:

Отправить комментарий