воскресенье, 27 августа 2023 г.

dwarf5 from clang 14

It seems that clang in version 14 utilize more advanced features from DWARF5, so I add their support to my dwarfdump. IMHO most exciting features are:

Section .debug_line_str

In old versions of dwarf filenames have duplicates for each compilation unit. Since dwarf version 5 they storing in separate section and thus shared and save some space. Obviously this space reducing is negligible compared to overhead from types duplication

Section .debug_str_offsets

Also for space reducing each compilation unit has so called base index for strings passed via DW_AT_str_offsets_base. But there is problem - some attributes already can have name before DW_AT_str_offsets_base occurs:


  <0><c>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <d>   DW_AT_producer    : (indexed string: 0): clang version 14.0.6 (git@github.com:github/semmle-code 5c87e7737f331823ed8ed280883888566f08cdea)
    <e>   DW_AT_language    : 33        (C++14)
    <10>   DW_AT_name        : (indexed string: 0x1): c/extractor/src/extractor.cpp
    <11>   DW_AT_str_offsets_base: 0x8


As you can see here 2 tags have names before we have value of string base. Much harder to parse in one pass now

New locations format

I think this is the most cool and useful feature - now each variable and parameter has set of locations linked with address ranges (that`s often case for highly optimized code). Sample:

   Offset Entry 2077
    0024ef56 00000000000006b4 (index into .debug_addr) 004fb3c500000000 (base address)
    0024ef59 0000000000000000 000000000000001c DW_OP_reg5 (rdi)

This cryptic message means that starting from address 0x4fb3c5 (note - most tools like objdump or llvm-dwarfdump cannot correctly show this new locations, in this case objdump showed address in bad format) some local variable located in register rdi until next address range. Seems that both IDA Pro and Binary Ninja cannot use this debug information:
.text:00004FB3C5     mov     rdi, cs:compilation_tf
.text:00004FB3CC     cmp     dword ptr [rdi+0Ch], 0

Global var compilation_tf has type a_trap_file_ptr - pointer to a_trap_file. IDA Pro has that types information from debug info but anyway cannot show access to field of a_trap_file at offset 0xC for next instruction

 
As result of all my patches now I can for example inspect IL structures from Microsoft CodeQL C++ extractor:
// Size 0x28
// FileName: c/extractor/edg/src/il_def.h
struct a_name_reference {
// Offset 0x0
a_name_reference_ptr next;
// Offset 0x8
a_name_qualifier_ptr qualifier;
// Offset 0x10
union {
    // Offset 0x0
    a_type_ptr destructor_type;
    // Offset 0x0
    a_property_or_event_descr_ptr property_or_event_descr;
  } variant;
// Offset 0x18
long num_template_arguments;
// Offset 0x20
enum a_special_function_kind special_kind;
// Offset 0x20
a_bit_field is_global_qualified_name:23:1;
// Offset 0x20
a_bit_field is_template_id:22:1;
// Offset 0x20
a_bit_field is_super_qualified:21:1;
// Offset 0x20
a_bit_field is_decltype_qualified:20:1;
// Offset 0x20
a_bit_field used_in_primary_declarator:19:1;
// Offset 0x20
a_bit_field from_prototype_instantiation:18:1;
};

Комментариев нет:

Отправить комментарий