воскресенье, 30 июля 2023 г.

gcc plugin to collect cross-references, part 1

Every user of IDA Pro likes cross-references - they are very useful but applicable for objects in global memory only. What if I want to have cross-references for virtual methods and class/record fields - like what functions some specific virtual method was called from? Unfortunately IDA Pro cannot shows this - partially because this information is not stored in debug info and also due to weak algo for types propagation. Call of virtual method typically looks similar to

 mov rax, [rbp+var_8] ; this
 mov     rax, [rax]   ;
this._vptr
 add     rax, 10h
 mov     rcx, [rax]   ; load method from vtable, why not
mov rcx, [rax+0x10]?
 call    rcx ; or even better just call [rax+0x10]?

Lets think where we can get such kind of cross-references - sure compiler must have it somewhere inside to generate native code, right? So generally speaking compiler is your next friend (right after disassembler and debugger).

Run gcc with -c -fdump-final-insns options on simple C++ test file and check how call of virtual method looks like:
(call_insn # 0 0 2 (set (reg:DI 0 ax)
        (call (mem:QI (reg/f:DI 1 dx [orig:85 _4 ] [85]) [ *OBJ_TYPE_REF(_4;this_7(D)->3B) S1 A8])
            (const_int 0 [0]))) "vtest.cc":31:21# {*call_value}

What? What is _4, which type has this and what means ->3B instead of method name? Looking ahead, I can say that actually all needed information really stored in RTL thought function dump_generic_node (from tree-pretty-print.cc) is just too lazy to show it properly. Seems that we can develop gcc plugin to extract this cross-references (in fact, the first couple of months of development this was not at all obvious)

why gcc?

bcs gcc is standard de-facto and you can expect that you will able to build with it almost any sources (usually after numerous loud curses finally read the documentation), even on windows. Nevertheless lets consider other popular alternatives

visual c++

Unlike .NET with excellent roslyn Microsoft don`t allow you to make plugin for their C++ compiler nor even describes it`s internal IR

clang

Don't write a clang plugin (c)

In fact probably you could do this. If I right remember llvm IR has full support for virtual methods. Another question is to what extent final native code matches with IR - like some pieces of code could be removed due to dead-code elimination, merged within GCSE pass and so on

Besides llvm still has some problems with producing bootable linux kernel

Anyway I am too stupid to understand bunch of llvm classes with hellish templates and too impatient to wait for several hours every time when recompiling clang after inserting couple of debug prints

gcc plugin basics

There are several good examples of gcc plugins: 1 & 2
The first has repeating bug while dump plugin parameters - plugin_info->argv[i].value should be checked against NULL bcs according to documentation in gcc command line option
-fplugin-arg-NAME-<key>[=<value>]
the value is not mandatory
So I just wrote class derived from rtl_opt_pass, registered it for pass "final" (sounds fatal and extremely intimidating) and now in virtual method execute we can open gates to the hell

walk on RTL tree

RTL has quite simple structure comparing to GENERIC/GIMPLE - for example maximal value of rtx_code enum is 0x98 (LAST_AND_UNUSED_RTX_CODE) vs tree_code 0x175 (MAX_TREE_CODES)
 
gcc don`t provide you with ready-to-use iterator for RTL but you can cut&paste some code from function rtx_writer::print_rtx. As you can see in rtx_writer::print_rtx_operand you can extract format string with GET_RTX_FORMAT (GET_CODE (in_rtx)) and then call format-specific dumper for each operand. Some of this dumpers in its turn call print_rtx again so this process is recursive and I had to add stack to track all parents RTL nodes (thus I can know that current expression is part of assignment to some memory if I see in this stack set mem:0 for example)
 
But the most important thing here is located in arbitrary places calls to print_mem_expr - thin wrapper for print_generic_expr. This is where you finally can have access to real gcc internals - tree_node union contained everything starting from structure tree_base
This is huge topic and I hope it will be described in next part

Комментариев нет:

Отправить комментарий