I made pale analog of world famous pdbdump to dump types and functions from DWARF. Before introducing my tool I have several words about DWARF - it is excess, compiler-specific, inconsistent and dangerous
Redudancy
gcc and llvm put every used types set in each compilation unit. This is really terrible if you use lots of templates like STL/boost - you will have duplicated declarations of std::map, std::string etc. Yep, this is main reason why stripped binaries becomes much smaller:
ls -l llvm-dwarfdump
llvm-dwarfdump.stripped
-rwxrwxr-x 1 redp redp 471241104 mar 29 00:52 llvm-dwarfdump
-rwxrwxr-x 1 redp redp 22170696 mar 29 17:49
llvm-
dwarfdump
.stripped
Another example - lets check how many times function console_printk declared in debug info from linux kernel:
grep console_printk vm.g | wc -l
2883
It is the same function declared in file include/linux/printk.h line 65 column 0xc - why linker can`t merge it`s type producing debug output?
Golang tries to fix this problem using types declarations once and then referring to them from another units (and at the same time compressing debug sections with zlib) - this is very ironically bcs anyway binaries on go typically have size in several Mb (btw llvm-dwarfdump cannot process compressed sections)
compiler-specific
This is pretty obvious - each programming language has some unique features and DWARF must deal with all of them
But just look at this:
<0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
<c> DW_AT_name : internal/cpu
<19> DW_AT_language : 22 (Go)
<1a> DW_AT_stmt_list : 0x0
<1e> DW_AT_low_pc : 0x401000
<26> DW_AT_ranges : 0x0
<2a> DW_AT_comp_dir : .
<2c> DW_AT_producer : Go cmd/compile go1.13.8
<44> Unknown AT value: 2905: cpu
I was unable to find in golang sources meaning of this custom attributes
Inconsistency
DWARF specification don`t define lots of important things. Just to name few:
- order of tags, so you can have mix of formal parameters with types at the same nesting level
- which attributes are mandatory for tags - I saw lots of missed DW_AT_sibling for example
- when locations info should be placed in separate section .debug_loc - seems that this happens for inlined subroutines only
- encoding of addresses. You have DW_AT_low_pc for functions address. But also there is DW_AT_abstract_origin (and DW_AT_specification). The same function can have different addresses even in plain C via this attributes:
<1><191cde>: Abbrev Number: 194 (DW_TAG_subprogram)
<191ce0> DW_AT_external : 1
<191ce0> DW_AT_name : (indirect string, offset: 0x24d2f): perf_events_lapic_init
<191ce4> DW_AT_decl_file : 1
<191ce5> DW_AT_decl_line : 1719
<191ce7> DW_AT_decl_column : 6
<191ce8> DW_AT_prototyped : 1
<191ce8> DW_AT_inline : 1 (inlined)
<1><19a945>: Abbrev Number: 96 (DW_TAG_subprogram)
<19a946> DW_AT_abstract_origin: <0x191cde>
<19a94a> DW_AT_low_pc : 0xffffffff81004dc0
<1><19b3c7>: Abbrev Number: 96 (DW_TAG_subprogram)
<19b3c8> DW_AT_abstract_origin: <0x191cde>
<19b3cc> DW_AT_low_pc : 0xffffffff81007930
All of this lead us to conclusion that DWARF is just
Dangerous
True ant-debugging trick - what if attribute DW_AT_type for DW_TAG_pointer_type points to the same tag? How about negative offset in DW_AT_sibling? I believe that this is very reach area for fuzzing
Features of dwarfdump