воскресенье, 31 декабря 2023 г.

Architecture and Design of the Linux Storage Stack

Not perfect but suitable book considering the small number of books about linux internals. IMHO most useful is chapter 10, so below is brief summary of the presented tools

And I have stupid question - has anyone already merged all this zoo in some cmdlet/package for linux powershell to have common API? At least I was unable to find something similar on powershellgallery

воскресенье, 17 декабря 2023 г.

Filling Trominos

IMHO this is very hard task - only 104 accepted solutions. My solution is here

Google gives lots of links for trominos but they all for totally different task from Euler Project - in our case we have only L-shapes. So lets think about possible algorithm

It`s pretty obvious that we can make 2 x 3 or 3 x 2 rectangles with couple of L-trominos. So naive solution is just to check if one size is divisible by 2 and other by 3

However with pen and paper you can quickly realize that you can for example fill rectangle 5 x 6:

aabaab
abbabb
ccddee
dcdced
ddccdd

Algo can look like (see function check2x3)

  • if one side of rectangle is divisible by 6 then another minus 2 should be divisible by 3
  • if one side of rectangle is divisible by 6 then another minus 3 should be divisible by 2

Submit our solution and from failed tests suddenly discovering that you also can have rectangle 9 x 5. Some details how this happens

So we can have maximal 3 groups of different shapes:

  • 9 x 5 rectangle (or even several if sides multiples of 5 & 9) - in my solution it stored in field has_95
  • 1 or 2 groups of 2 x 3 rectangles below 9 x 5 shape. 1 for case when you can fill this area with shapes 2 x 3 of the same orientation and 2 if you must mix vertical and horizontal rectangles - field trom
  • the same 1 or 2 groups on right of 9 x 5 shape - field right

Now the only remained problem is coloring

Rectangle 9 x 5 has 5 different colors but it is possible to arrange trominos in such way that on borders it will have only 4 colors and 5th is inside. For groups of 2 x 3 rectangles you need 4 colors if group size is 1 and yet 4 if size is 2. In worst case number of colors is 4 for 9 x 5 + 2 * 2 * 4 = 20 - so we can fit in A-Z

воскресенье, 12 ноября 2023 г.

my solutions for couple CSES tasks

CSES has two very similar by description tasks but with completely different solutions: "Critical Cities" (218 accepted solutions at time when I writing this) and "Visiting Cities" (381 accepted solutions)

Critical Cities

We are given an directed unweighted graph and seems that we need to find it`s dominators for example using Lengauer-Tarjan algo (with complexity O((V+E)log(V+E))
Then we could check each vertex in this dominators tree to see if it leads to target node, so overall complexity is O(V * (V+E)log(V+E))
 
This looks not very impressive IMHO. Lets try something completely different (c) Monty Python's Flying Circus. For example we could run wave (also known as Lee algorithm) from source to target and get some path with complexity O(V+E). Note that in worst case this path can contain all vertices. Lets mark all vertices in this path
Next we could continue to run waves but at this time ignoring edges from marked nodes and see what marked vertices are still reachable. For example on some step k we run wave from Vs and reached vertices Vi and Vj. We can conclude that all vertices in early found path between Vs and Vj are NOT critical cities. So we can repeat next step starting with Vj
This process can be repeated in worst case V times so overall complexity is O(V*(V+E))
 
My solution is here

 

Visiting Cities

At this time we are given an directed weighted graph and seems that simplest solution is to find all K-th shortest paths (for example with Yen algo) and make union of their vertices. Because I'm very lazy I decided to reuse some ready and presumably well-tested implementation of this algo. You can read about fabulous results here
 

After that I plunged into long thoughts until I decided to count how many paths of minimal length go through each vertex - actually we could run Dijkstra in both directions: from source to target and from target to source, counting number of paths with minimal length. And then we could select from this path vertices where product of direct counts with reverse equal to direct count on target (or reverse count on source) - it`s pretty obvious that you can`t avoid such vertices in any shortest path. Complexity of this solution is two times from Dijkstra algo (depending from implementation O(V^2) or O(V * log(V) + E * log(V)) using some kind of heap) + in worst case V checks for each vertices in first found shortest path

My solution is here

четверг, 9 ноября 2023 г.

kssp library, part 2

Previous part 

I was struck by the idea of how to reduce size of graph before enumerating all K-th shortest paths. We can use cut-points. By definition if we have cut-point in some shortest path it must be visited always - otherwise you can`t reach the destination. So algo is simple
  1. find with Dijkstra algo first shortest paths for whole graph
  2. find all cut-points in whole graph
  3. iterate over found shortest path - if current vertex is cut-point - we can run brute-force from previous cut-point till current

Results are crazy anyway - on the same test 7

  • Yen: 13.86s, 29787, 3501 cycles
  • Node Classification: 118.4s, 30013, 3501 cycles
  • Postponed Node Classification: 104.95s, 30013, 3501 cycles
  • PNC*: 120.33s, 30013, 3501 cycles
  • Parsimonious Sidetrack Based: 4.66s, 29980, 3501 cycles
  • Parsimonious Sidetrack Based v2: 4.79s, 29980, 3501 cycles
  • Parsimonious Sidetrack Based v3: 4.85s, 29980, 3501 cycles
  • Sidetrack Based: 4.18s, 29980, 3501 cycles
  • Sidetrack Based with update: 4.34s, 29980, 3501 cycles
At this time all algos worked to completion and again gave different results...
Source code

вторник, 7 ноября 2023 г.

kssp library

I`ve tried to solve CSES task "visiting cities"

Looks like you can use kind of brute-force - get 1st shortest paths, then all remained with the same cost and make union of cities in each path - nor elegant nor smart algorithm, just to estimate if this approach works at all

I remember from my university course "graph theory" about Yen`s algo to get K-th shortest path so choosed to use some ready (and hopefully well-tested) implementation - kssp library from INRIA (yep - famous place where OCaml was invented)

And then happened real madness - different algos gave me different result and moreover - they didn't match with "correct" results from CSES! Lets see what I got (all results for test 7, compilation options -O3 -DTIME -DNDEBUG):

  • Yen - 470s, 30157 cities
  • node classification - 4.37s, 30140 cities
  • postponed node classification - 3.83s, 30140 cities
  • postponed node classification with star - 3.76s, 30140 cities
  • sidetrack based - consumed 13Gb of memory and met with OOM killer
  • parsimonious sidetrack based - OOM again, perhaps bcs not enough parsimonious :-)

Source code

понедельник, 11 сентября 2023 г.

location lists from dwarf5

I added during past weekend support for var location lists from DWARF5 (located in separate section .debug_loclists) in my dwarfdump. As usually lots of bugs were found

First - they presents only for functions but not for methods. Probably this is real bug and can have serious impact when debugging

Second - generated expressions is not optimal. Lets see example:

locx 53e
4FB5DF - 4FB5F4: DW_OP_piece 0x8, DW_OP_reg0 RAX, DW_OP_piece 0x8, DW_OP_breg0 RAX+0, DW_OP_lit3, DW_OP_lit8, DW_OP_mul, DW_OP_plus, DW_OP_stack_value, DW_OP_piece 0x8
4FB5F4 - 4FB60F: DW_OP_piece 0x8, DW_OP_reg2 RCX, DW_OP_piece 0x8, DW_OP_breg0 RAX+0, DW_OP_lit3, DW_OP_lit8, DW_OP_mul, DW_OP_plus, DW_OP_stack_value, DW_OP_piece 0x8

As you can see this expressions are the same but for adjacent addresses ranges. Why not use single expression for range 4FB5DF - 4FB60F?
Update: I have made patch to estimate amount of identical adjacent location lists. C++ extractor from codeql has total 149850 lists and 116 redudant or 0.077%

DW_OP_mul just pops from stack couple of values and put back result of their multiplication (see evaluation logic in method execute_stack_op), so this sub-expression can be rewritten as just DW_OP_lit24

Also it`s curious to check what other compilers support subj:

gcc

Yes, with options -g -gdwarf-5 -fvar-tracking

golang

No - they even don`t support DWARF5 at all

openwatcom v2

No - judging by the funny comments

WHEN YOU FIGURE OUT WHAT THIS FILE DOES, PLEASE DESCRIBE IT HERE!

пятница, 1 сентября 2023 г.

gcc plugin to collect cross-references, part 7

Part 1, 2, 3, 4, 5 & 6

Lets check if we can extract other kind of constants - numerical. Theoretically there are no problems - they have types INTEGER_CST, REAL_CST, COMPLEX_CST and so on. And you even can meet them - mostly in programs written in fortran
In most code they usually replaced with RTX equivalents like
  • INTEGER_CST - const_int (or const_wide_int)
  • REAL_CST - const_double

const_double is easy case but const_ints are really ubiquitous, they can appear in RTX even when they do not occur in operands of asssembler`s code. So main task is to select only small subset of them. Let`s consider what we can filter out

fields offsets

Luckily this hard part has already been solved in previous part

local variables offsets in stack

RTX has field frame_related:
1 in an INSN or a SET if this rtx is related to the call frame, either changing how we compute the frame address or saving and restoring registers in the prologue and epilogue

this flag affects both parts of set, for loading something from stack it looks something like:
set (reg:DI 0 ax [83])
        (mem/f/c:DI (plus:DI (reg/f:DI 6 bp)
                (const_int -8 [0xfffffffffffffff8]))

and for storing to stack like:
set (mem/f/c:DI (plus:DI (reg/f:DI 6 bp)
                (const_int -8 [0xfffffffffffffff8])) [4 this+0 S8 A64])
        (reg:DI 5 di [ _0 ]))

conditions

Yes, if_then_else almost always follows compare: 
(set (reg:CCZ 17 flags)
        (compare:CCZ (reg:QI 2 cx [orig:83 _2 ] [83])
            (const_int 0 [0]))) "vtest.cc":44:19 5 {*cmpqi_ccno_1}
(jump_insn 10 9 11 2 (set (pc)
        (if_then_else (eq (reg:CCZ 17 flags)
                (const_int 0 [0]))
            (label_ref 16)
            (pc))) "vtest.cc":44:19 891 {*jcc}
All these bulky constructions will be translated to just jz, so no const_int 0 will be placed in output

EH block index

like in each function call:

(expr_list:REG_EH_REGION (const_int 0 [0])

воскресенье, 27 августа 2023 г.

dwarf5 from clang 14

It seems that clang in version 14 utilize more advanced features from DWARF5, so I add their support to my dwarfdump. IMHO most exciting features are:

Section .debug_line_str

In old versions of dwarf filenames have duplicates for each compilation unit. Since dwarf version 5 they storing in separate section and thus shared and save some space. Obviously this space reducing is negligible compared to overhead from types duplication

Section .debug_str_offsets

Also for space reducing each compilation unit has so called base index for strings passed via DW_AT_str_offsets_base. But there is problem - some attributes already can have name before DW_AT_str_offsets_base occurs:


  <0><c>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <d>   DW_AT_producer    : (indexed string: 0): clang version 14.0.6 (git@github.com:github/semmle-code 5c87e7737f331823ed8ed280883888566f08cdea)
    <e>   DW_AT_language    : 33        (C++14)
    <10>   DW_AT_name        : (indexed string: 0x1): c/extractor/src/extractor.cpp
    <11>   DW_AT_str_offsets_base: 0x8


As you can see here 2 tags have names before we have value of string base. Much harder to parse in one pass now

New locations format

I think this is the most cool and useful feature - now each variable and parameter has set of locations linked with address ranges (that`s often case for highly optimized code). Sample:

   Offset Entry 2077
    0024ef56 00000000000006b4 (index into .debug_addr) 004fb3c500000000 (base address)
    0024ef59 0000000000000000 000000000000001c DW_OP_reg5 (rdi)

This cryptic message means that starting from address 0x4fb3c5 (note - most tools like objdump or llvm-dwarfdump cannot correctly show this new locations, in this case objdump showed address in bad format) some local variable located in register rdi until next address range. Seems that both IDA Pro and Binary Ninja cannot use this debug information:
.text:00004FB3C5     mov     rdi, cs:compilation_tf
.text:00004FB3CC     cmp     dword ptr [rdi+0Ch], 0

Global var compilation_tf has type a_trap_file_ptr - pointer to a_trap_file. IDA Pro has that types information from debug info but anyway cannot show access to field of a_trap_file at offset 0xC for next instruction

 
As result of all my patches now I can for example inspect IL structures from Microsoft CodeQL C++ extractor:

суббота, 19 августа 2023 г.

gcc plugin to collect cross-references, part 6

Part 1, 2, 3, 4 & 5
Finally I was able to compile and collect cross-references for enough big open-source projects like linux kernel and botan:
wc -l botan.db
2108274 botan.db
grep Err: botan.db | wc -l
540

So lets check how we can extract access to record fields. If you take quick look at tree.def you can notice very prominent type COMPONENT_REF:
Value is structure or union component.
 Operand 0 is the structure or union (an expression).
 Operand 1 is the field (a node of type FIELD_DECL).
 Operand 2, if present, is the value of DECL_FIELD_OFFSET

 

Sounds easy? "In theory there is no difference between theory and practice". In practice you can encounter many other types in any combinations, like in this relative simple RTL:
(call_insn:TI 1482 1481 2856 35 (call (mem:QI (mem/f:DI (plus:DI (reg/f:DI 0 ax [orig:340 MEM[(struct Server_Hello_13 *)_325].D.264452.D.264115._vptr.Handshake_Message ] [340])
                    (const_int 24 [0x18])) [744 MEM[(int (*) () *)_199 + 24B]+0 S8 A64]) [0 *OBJ_TYPE_REF(_200;&MEM[(struct _Uninitialized *)&D.349029].D.305525._M_storage->3B) S1 A8])
        (const_int 0 [0])) "/usr/local/include/c++/12.2.1/bits/stl_construct.h":88:18 898 {*call}
     (expr_list:REG_CALL_ARG_LOCATION (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di)
                (reg/f:DI 41 r13 [386]))
            (nil))
        (expr_list:REG_DEAD (reg:DI 5 di)
            (expr_list:REG_DEAD (reg/f:DI 0 ax [orig:340 MEM[(struct Server_Hello_13 *)_325].D.264452.D.264115._vptr.Handshake_Message ] [340])
                (expr_list:REG_EH_REGION (const_int 0 [0])
                    (expr_list:REG_CALL_DECL (nil)
                        (nil))))))
    (expr_list:DI (use (reg:DI 5 di))
        (nil)))

So I`ll describe in brief some TREE types and how to deal with them to extract something useful

среда, 16 августа 2023 г.

gcc plugin to collect cross-references, part 5

Part 1, 2, 3 & 4

Lets check how RTL describes jump tables. I made simple test and output of gcc -fdump-final-insns looks like:

(jump_insn # 0 0 8 (parallel [
            (set (pc)
                (reg:DI 0 ax [93]))
            (use (label_ref #))
        ]) "swtest.c":14:3# {*tablejump_1}
     (nil)
 -> 8)
(barrier # 0 0)
(code_label # 0 0 8 (nil) [2 uses])
(jump_table_data # 0 0 (addr_vec:DI [
            (label_ref:DI #)
            (label_ref:DI #)
...
        ]))

As you can see jump_insn uses opcode tablejump_1 refering to label 8. Right after this label located RTL with code jump_table_data - perhaps this is bad idea to assume that it always will be true so it`s better to use function jump_table_for_label. Also for some unknown reason option -fdump-final-insns does not show content of jump tables. So at least lets try to find jump_table_datas from plugin

Surprisingly you cannot find then when iterating on instructions within each block (using FOR_ALL_BB_FN/FOR_BB_INSNS macros). I suspect this due to the fact that both label and jump_table belong to block with index 0. So I used another cycle: 
for ( insn = get_insns(); insn; insn = NEXT_INSN(insn) )
Then we can check if current RTL instruction is jump table with JUMP_TABLE_DATA_P. Jump tables have addr_vec in element with index 3 and each element is label_ref. Length of vector can be obtained from field num_elem. Pretty easy, so what we can do with this knowledge?

понедельник, 14 августа 2023 г.

gcc plugin to collect cross-references, part 4

Let`s apply priceless knowledge from previous part - for example to extract string literals and insert polymorphic decryption
Typical call to printf/printk in RTL looks usually like

(insn 57 56 58 9 (set (reg:DI 5 di)
        (symbol_ref/f:DI ("*.LC0") [flags 0x2] <var_decl 0x7f480e9edea0 *.LC0>)) "swtest.c":17:7 80 {*movdi_internal} 

(call_insn 59 58 191 9 (set (reg:SI 0 ax)
        (call (mem:QI (symbol_ref:DI ("printf") [flags 0x41] <function_decl 0x7f480e8c7000 printf>) [0 __builtin_printf S1 A8])
            (const_int 0 [0]))) "swtest.c":17:7 909 {*call_value}
     (nil)
    (expr_list (use (reg:QI 0 ax))
        (expr_list:DI (use (reg:DI 5 di))
            (nil))))

Translation for mere mortals

суббота, 12 августа 2023 г.

gcc plugin to collect cross-references, part 3

Part 1 & 2
Lets start walk climb on TREEs. Main sources for reference are tree.h, tree-core.h & print-tree.cc
 
Caution: bcs we traveling during RTL pass some of tree types already was removed so it is unlikely to meet TYPE_BINFO/BINFO_VIRTUALS etc

Main structure is tree_base, it included as first field in all other types - for example tree_type_non_common has first field with type tree_type_with_lang_specific,
which has field common with type tree_type_common,
which again has field common with type tree_common
which has field typed with type tree_typed,
which has field base with type tree_base 
Kind of ancient inheritance in pure C

Caution 2: many fields has totally different meaning for concrete types, so GCC strictly stimulate to use macros from tree.h to access all fields

Type of TREE can be obtained with CODE_TREE and name of code with function get_tree_code_name
CODE_TREE returns enum tree_code and it has a lot of values - MAX_TREE_CODES eq 0x175. So lets check only important subset

среда, 9 августа 2023 г.

gcc plugin to collect cross-references, part 2

Because I still fighting with endless variants of unnamed types while processing linux kernel lets talk about persistence
 
The final goal of this plugin is to make from sources database of functions for methods they call and fields they use. After that you can find set of functions referring to some field/method and investigate them later with disasm for example
 
So sure plugin must store it`s results to somewhere
Probably graph databases is better suited for such data - like you can put symbols as vertices and references as edges, then all references to some symbol is just all of it`s incoming edges. But I am too lazy to install JVM and Neo4j, so I used SQLite (and simple YAML-like files for debugging). You can connect your own storage by implementing interface FPersistence

воскресенье, 30 июля 2023 г.

gcc plugin to collect cross-references, part 1

Every user of IDA Pro likes cross-references - they are very useful but applicable for objects in global memory only. What if I want to have cross-references for virtual methods and class/record fields - like what functions some specific virtual method was called from? Unfortunately IDA Pro cannot shows this - partially because this information is not stored in debug info and also due to weak algo for types propagation. Call of virtual method typically looks similar to

 mov rax, [rbp+var_8] ; this
 mov     rax, [rax]   ;
this._vptr
 add     rax, 10h
 mov     rcx, [rax]   ; load method from vtable, why not
mov rcx, [rax+0x10]?
 call    rcx ; or even better just call [rax+0x10]?

Lets think where we can get such kind of cross-references - sure compiler must have it somewhere inside to generate native code, right? So generally speaking compiler is your next friend (right after disassembler and debugger).

Run gcc with -c -fdump-final-insns options on simple C++ test file and check how call of virtual method looks like:
(call_insn # 0 0 2 (set (reg:DI 0 ax)
        (call (mem:QI (reg/f:DI 1 dx [orig:85 _4 ] [85]) [ *OBJ_TYPE_REF(_4;this_7(D)->3B) S1 A8])
            (const_int 0 [0]))) "vtest.cc":31:21# {*call_value}

What? What is _4, which type has this and what means ->3B instead of method name? Looking ahead, I can say that actually all needed information really stored in RTL thought function dump_generic_node (from tree-pretty-print.cc) is just too lazy to show it properly. Seems that we can develop gcc plugin to extract this cross-references (in fact, the first couple of months of development this was not at all obvious)

why gcc?

bcs gcc is standard de-facto and you can expect that you will able to build with it almost any sources (usually after numerous loud curses finally read the documentation), even on windows. Nevertheless lets consider other popular alternatives

пятница, 26 мая 2023 г.

ctf-like task based on maximal clique problem

Sources

There is undirected graph with 1024 vertices and 100909 edges (so average degree is 98.5). It is known that the graph contains clique with size 16. You can pass indexes of clique`s vertices in command line like

./ctf 171 345

./ctf 171 346
too short clique 

This vertices of clique then used to derive AES key and decrypt some short string

Can you solve this?

среда, 24 мая 2023 г.

yet another maximal clique algorithm

It seems that most of known algorithms for maximal clique try to add as much vertices as possibly and evolving towards more complex heuristics for vertices ordering. But there is opposite way - we can remove some vertices from neighbors, right?

Lets assume that we sorted all vertices of graph with M vertices and N edges by their degrees in descending order and want to check if some vertex with degree K can contain clique. We can check if all of it`s neighbors mutually connected and find one or several most loosely connected vertices - lets name it L. This checking requires K -1 access to adjacency matrix for first vertex, K -2 for second etc - in average (K^ 2) / 2. If no unconnected vertices was found - all survived neighbors are clique. See sample of implementation in function naive_remove

Now we should decay what we can do with L and there is only 2 variants:

  1. we can remove it from set of neighbors
  2. we can keep it and remove from set of neighbors all vertices not connected with L

Notice that in both cases amount of neighbors decreased by at least 1. Now we can recursively repeat this process with removed and remained L at most K times, so complexity will be O = (K ^ 2) / 2 * (2 ^ K)

We can repeat this process for all vertices with degree bigger than maximal size of previously found clique - in worse case M times, so overall complexity of this algorithm is O = M * (K ^ 2) / 2 * (2 ^ K)

In average K =  N / M

well, not very good result but processing of each vertex can be done in parallel

We can share adjacency matrix (or even make it read-only) between all working threads and this recursive function will require in each step following memory:

  1. bitset of survived neighbors - K / 8 where V[i] is 1 if this vertex belongs to neighbors and 0 if it was removed
  2. array for unconnected vertices counts with size K

given that recursion level does not exceed K overall used space on stack is

S = K * (K / 8 + K * sizeof(index))

now check if we can run this algorithm on

gpu

Disclaimer: I read book about CUDA programming almost 10 years ago so I can be wrong

воскресенье, 21 мая 2023 г.

estimation of maximum clique size

definition 1.1 from really cool book "The Design of Approximation Algorithms":

An α-approximation algorithm for an optimization problem is a polynomial-time algorithm that for all instances of the problem produces a solution whose value is within a factor of α of the value of an optimal solution

so you need first to estimate at least size of possible optimal solution, right?

Surprisingly I was unable to find it for maximal clique. stackexchange offers very simple formula (spoiler: the actual size is a couple of orders of magnitude smaller). python networkX offers method with complexity O(1.4422n) to find maximal clique itself only. cool. Let's invent this algorithm by ourselves

From wikipedia:

A clique, C, in an undirected graph G = (V, E) is a subset of the vertices, CV, such that every two distinct vertices are adjacent

in other words this means that graph with maximal clique of size K should contains at least K vertices with degree K - 1 or bigger. So we can arrange vertices on degrees and find some degree S where amount of vertices with degree S or bigger is >= S. But this is very rough estimation and it could be refined taking into account the following observation - we can remove all edges to vertices not belonging to this subgraph. So algo is:

  1. calculate degrees of all vertices and arrange them in descending order
  2. for each degree S find first where amount of vertices with degree S or bigger is >= S
  3. put all such vertices in sub-graph SD
  4. remove from SD all edges to vertices not belonging to SD
  5. recalculate degrees of all vertices in SD
  6. find another degree S in SD where amount of vertices with degree S or bigger is >= S. this will be result R

next we can repeat steps 2-6 until enumerate all degrees or some degree will be less than the previously found result R

Complexity

Let N - amount of vertices and M - amount of edges. Then cycle can run max N times and in each cycle we can remove less that M edges (actually in average M/2), so in worst case complexity is O(MN/2)

Results 

суббота, 15 апреля 2023 г.

custom attributes in gcc and dwarf

Lets check if we can add our own attributes (if Google can afford it, then why is it forbidden to mere mortals?). For example I want to have in gcc and dwarf flag about functions/methods parameters direction - is some param IN or OUT. I chose the value of this dwarf attribure 0x28ff

It`s pretty obviously that we can add our own custom attribute in gcc - they even have example how to do this. But what about dwarf producer? Long story short - seems that you cannot do it from plugin. The only dwarf related pass for plugins is pass_dwarf2_frame. So we need to patch gcc. But before this we need to 

build gcc from sources

At moment of writing latest stable version of gcc was 12.0 so run 

git clone --branch releases/gcc-12 https://github.com/gcc-mirror/gcc.git

and then follows instructions

patch gcc

Lets see how gcc produces dwarf output. All symbol table formatters implement gcc_debug_hooks and currently gcc has 3 (btw there are patches for mingw to produce PDB, so in theory you could have vmlinux.pdb):

so lets add function add_param_direction in dwarf2out.cc:
bool add_param_direction(tree decl, dw_die_ref parm_die)
{
  bool pa1 = lookup_attribute ("param_in", DECL_ATTRIBUTES (decl));
  bool pa2 = lookup_attribute ("param_out", DECL_ATTRIBUTES (decl));
  if ( !(pa1 ^ pa2) )
    return false;
  unsigned char pa_value = 0;
  // seems that you can`t have flag with value 1 - see gcc_assert at line 9599
  if ( pa1 )
    pa_value = 2;
  if ( pa2 )
    pa_value = 3;
  add_AT_flag(parm_die, (dwarf_attribute)0x28ff, pa_value);
  return true;
}
It first checks if parameter has attribute param_in or param_out (but not both at the same time bcs this is senseless) and adds custom dwarf flag attribute via add_AT_flag call. Then we just need to call this function from gen_formal_parameter_die
 
Now we can add our custom attributes - this can be done via plugin but I preferred to patch c-family/c-attribs.cc:
 
tree handle_param_in_attribute (tree *node, tree name, tree ARG_UNUSED (args),
                         int ARG_UNUSED(flags), bool *no_add_attrs)
{
  if ( !DECL_P (*node) )
  {
    warning (OPT_Wattributes, "%qE attribute can apply to params declarations only", name);
    *no_add_attrs = true;
    return NULL_TREE;
  }
  tree decl = *node;
  if (TREE_CODE (decl) != PARM_DECL)
  {
    warning (OPT_Wattributes, "%qE attribute can apply to params only", name);
    *no_add_attrs = true;
  } else {
    // check presense of param_out
    if ( lookup_attribute ("param_out", DECL_ATTRIBUTES (decl)) )
    {
      warning (OPT_Wattributes, "%qE attribute useless when param_out was used", name);
      *no_add_attrs = true;
      DECL_ATTRIBUTES (decl) = remove_attribute("param_out", DECL_ATTRIBUTES (decl));
    }
  }
  return NULL_TREE;
}

Function handle_param_in_attribute checks that this attribute linked with function/method parameter. Then it checks that the same parameter don`t have attribute param_out - in this case it just removes both
 
All patches located here 

results

понедельник, 10 апреля 2023 г.

custom dwarf attributes in golang

Finally I found them

0x2900

DW_AT_go_kind, form DW_FORM_data1. Internal golang types kind. For example DW_TAG_structure_type can have kind Struct, Slice or String. I made script to extract statistic which kind can be attached to dwarf tags

0x2901

DW_AT_go_key, form DW_FORM_ref_addr - tag ID. Can be attached to kindMap

0x2902

DW_AT_go_elem, form DW_FORM_ref_addr - tag ID. Can be attached to
  • kindChan
  • kindMap
  • kindSlice

0x2903

DW_AT_go_embedded_field, form DW_FORM_flag. If non-zero member is embedded structure

0x2904

DW_AT_go_runtime_type, form DW_FORM_addr. I don`t know what is it - sure this is not real VA bcs it can point to random sections and even be out of elf module

0x2905

DW_AT_go_package_name, form DW_FORM_string. Just package name for compilation unit

0x2906

DW_AT_go_dict_index, form DW_FORM_udata
index of the dictionary entry describing the real type of this type shape
 
 

пятница, 31 марта 2023 г.

DWARF size overhead

I made today simple script to estimate size overhead due types duplication. This is hard task for C++ - bcs some types can have specialized (or partially specialized) template parameters and sure this types should be considered as different. But for plain C we can safely get all high-level types and assume that types with the same name and declared at the same line and column are equal

Next I ran this script on objdump -g dump for linux kernel. Script gave me digit 252741370

Lets find size of .debug_info section

objdump -h vmlinux | grep debug_info
 35 .debug_info   118205ec  0000000000000000  0000000000000000  03037230  2**0

Size is 0x118205ec = 293733868

And finally lets calculate share of unnecessary info: 252741370 /  293733868 = 0,8604

I am shocked - 86%!!! Looks like hd manufacturers conspiracy 

Update: for C++ I made another version of this script to support namespaces and got following results:

  • cc1 from gcc 111858917 / 130272407 = 0,8587
  • gdb 105034993 / 139899598 = 0,7508
  • llvm-dwarfdump 194031570 / 263053224 = 0,7376

четверг, 30 марта 2023 г.

dwarfdump

I made pale analog of world famous pdbdump to dump types and functions from DWARF. Before introducing my tool I have several words about DWARF - it is excess, compiler-specific, inconsistent and dangerous

Redudancy

gcc and llvm put every used types set in each compilation unit. This is really terrible if you use lots of templates like STL/boost - you will have duplicated declarations of std::map, std::string etc. Yep, this is main reason why stripped binaries becomes much smaller:

ls -l llvm-dwarfdump llvm-dwarfdump.stripped

-rwxrwxr-x 1 redp redp 471241104 mar 29 00:52 llvm-dwarfdump
-rwxrwxr-x 1 redp redp 22170696  mar 29 17:49
llvm-dwarfdump.stripped

Another example - lets check how many times function console_printk declared in debug info from linux kernel:
grep console_printk vm.g | wc -l
2883

It is the same function declared in file include/linux/printk.h line 65 column 0xc - why linker can`t merge it`s type producing debug output?
 
Golang tries to fix this problem using types declarations once and then referring to them from another units (and at the same time compressing debug sections with zlib) - this is very ironically bcs anyway binaries on go typically have size in several Mb (btw llvm-dwarfdump cannot process compressed sections)

 

compiler-specific 

This is pretty obvious - each programming language has some unique features and DWARF must deal with all of them
But just look at this:
 <0><b>: Abbrev Number: 1 (DW_TAG_compile_unit)
    <c>   DW_AT_name        : internal/cpu
    <19>   DW_AT_language    : 22       (Go)
    <1a>   DW_AT_stmt_list   : 0x0
    <1e>   DW_AT_low_pc      : 0x401000
    <26>   DW_AT_ranges      : 0x0
    <2a>   DW_AT_comp_dir    : .
    <2c>   DW_AT_producer    : Go cmd/compile go1.13.8
    <44>   Unknown AT value: 2905: cpu

I was unable to find in golang sources meaning of this custom attributes

 

Inconsistency

DWARF specification don`t define lots of important things. Just to name few:
  • order of tags, so you can have mix of formal parameters with types at the same nesting level
  • which attributes are mandatory for tags - I saw lots of missed DW_AT_sibling for example
  • when locations info should be placed in separate section .debug_loc - seems that this happens for inlined subroutines only
  • encoding of addresses. You have DW_AT_low_pc for functions address. But also there is DW_AT_abstract_origin (and DW_AT_specification). The same function can have different addresses even in plain C via this attributes: 
     <1><191cde>: Abbrev Number: 194 (DW_TAG_subprogram)
        <191ce0>   DW_AT_external    : 1
        <191ce0>   DW_AT_name        : (indirect string, offset: 0x24d2f): perf_events_lapic_init
        <191ce4>   DW_AT_decl_file   : 1
        <191ce5>   DW_AT_decl_line   : 1719
        <191ce7>   DW_AT_decl_column : 6
        <191ce8>   DW_AT_prototyped  : 1
        <191ce8>   DW_AT_inline      : 1    (inlined)
     <1><19a945>: Abbrev Number: 96 (DW_TAG_subprogram)
        <19a946>   DW_AT_abstract_origin: <0x191cde>
        <19a94a>   DW_AT_low_pc      : 0xffffffff81004dc0
     <1><19b3c7>: Abbrev Number: 96 (DW_TAG_subprogram)
        <19b3c8>   DW_AT_abstract_origin: <0x191cde>
        <19b3cc>   DW_AT_low_pc      : 0xffffffff81007930


 All of this lead us to conclusion that DWARF is just

Dangerous

True ant-debugging trick - what if attribute DW_AT_type for DW_TAG_pointer_type points to the same tag? How about negative offset in DW_AT_sibling? I believe that this is very reach area for fuzzing

 

Features of dwarfdump