суббота, 19 августа 2023 г.

gcc plugin to collect cross-references, part 6

Part 1, 2, 3, 4 & 5
Finally I was able to compile and collect cross-references for enough big open-source projects like linux kernel and botan:
wc -l botan.db
2108274 botan.db
grep Err: botan.db | wc -l
540

So lets check how we can extract access to record fields. If you take quick look at tree.def you can notice very prominent type COMPONENT_REF:
Value is structure or union component.
 Operand 0 is the structure or union (an expression).
 Operand 1 is the field (a node of type FIELD_DECL).
 Operand 2, if present, is the value of DECL_FIELD_OFFSET

 

Sounds easy? "In theory there is no difference between theory and practice". In practice you can encounter many other types in any combinations, like in this relative simple RTL:
(call_insn:TI 1482 1481 2856 35 (call (mem:QI (mem/f:DI (plus:DI (reg/f:DI 0 ax [orig:340 MEM[(struct Server_Hello_13 *)_325].D.264452.D.264115._vptr.Handshake_Message ] [340])
                    (const_int 24 [0x18])) [744 MEM[(int (*) () *)_199 + 24B]+0 S8 A64]) [0 *OBJ_TYPE_REF(_200;&MEM[(struct _Uninitialized *)&D.349029].D.305525._M_storage->3B) S1 A8])
        (const_int 0 [0])) "/usr/local/include/c++/12.2.1/bits/stl_construct.h":88:18 898 {*call}
     (expr_list:REG_CALL_ARG_LOCATION (expr_list:REG_DEP_TRUE (concat:DI (reg:DI 5 di)
                (reg/f:DI 41 r13 [386]))
            (nil))
        (expr_list:REG_DEAD (reg:DI 5 di)
            (expr_list:REG_DEAD (reg/f:DI 0 ax [orig:340 MEM[(struct Server_Hello_13 *)_325].D.264452.D.264115._vptr.Handshake_Message ] [340])
                (expr_list:REG_EH_REGION (const_int 0 [0])
                    (expr_list:REG_CALL_DECL (nil)
                        (nil))))))
    (expr_list:DI (use (reg:DI 5 di))
        (nil)))

So I`ll describe in brief some TREE types and how to deal with them to extract something useful

 

COMPONENT_REF

Usually several component references form chain of fields - like for SomeStruct.f1.f2.f3 there will be 3:
  1. COMPONENT_REF Op1 will contain FIELD_DECL to field f3 and Op2 reference to
  2. COMPONENT_REF Op1 will contain FIELD_DECL to field f2 and Op2 reference to
  3. COMPONENT_REF Op1 will contain FIELD_DECL to field f1 and Op2 finally references to RECORD_TYPE/UNION_TYPEfor SomeStruct

Pretty easy? Actually not - there are at least two problems:

  • Both Op1 & Op2 can contain any other types - for example SSA_NAME
  • Record in each chain can be nameless. For C++ you can find enclosed class with function get_containing_scope, but in C all nested nameless structures actually has scope TRANSLATION_UNIT_DECL - in such case there is chance that chain will be unlinked

Dirty hack - you even don`t need RECORD_TYPE for each field bcs you can extract it with DECL_CONTEXT

 

SSA_NAME

Just reference to some other TREE - it can be extracted with TREE_TYPE. See function dump_ssa_name


MEM_REF

The type of the MEM_REF is the type the bytes at the memory location are interpreted as.
   MEM_REF <p, c> is equivalent to ((typeof(c))p)->x... where x... is a
   chain of component references offsetting p by c

Type can be extracted for tree type TARGET_MEM_REF with TMR_BASE and offset with TMR_OFFSET, for MEM_REF with TREE_OPERAND(0) and offset with TREE_OPERAND(1).

Well, it would be good to find field at this offset, right? First field can be extracted with TYPE_FIELDS and next with TREE_CHAIN. See function dump_mem_ref for details


ADDR_EXPR

& in C.  Value is the address at which the operand's value resides
Type of value can be extracted with TREE_OPERAND(expr, 0) and again can be any of TREE types (even STRING_CST!). See function dump_addr_expr for details

 

OBJ_TYPE_REF

Used to represent lookup in a virtual method table which is dependent on
   the runtime type of an object.  Operands are:
   OBJ_TYPE_REF_EXPR: An expression that evaluates the value to use.
   OBJ_TYPE_REF_OBJECT: Is the object on whose behalf the lookup is
   being performed.  Through this the optimizers may be able to statically
   determine the dynamic type of the object.
   OBJ_TYPE_REF_TOKEN: An integer index to the virtual method table.
   The integer index should have as type the original type of
   OBJ_TYPE_REF_OBJECT
 

Main source for collecting virtual methods calls, base class can be extracted with obj_type_ref_class function.
Warning: don`t try to search in virtual table (you can get it from TYPE_VFIELD). See output of gcc -fdump-lang-class to see what it contains and therefore totally unusable in most cases
Warning2: you can use DECL_VINDEX only for declarations (like FUNCTION_DECL). Often only thing you have is TYPE (like METHOD_TYPE) - for example SSA name can hold pointer to method. In gcc there are no ways to backtrack to which declaration some type belongs to

 

So now we can collect all access types to class/structures field and methods. The only uncovered type is pointer to method - can it be tracked? Unfortunately no - nor where offset to method assigned nor where it called. I wrote simple test and methods get_ref look in disasm like:

      mov     eax, 33 ; just some const
      mov     edx, 0
      pop     rbp

and in RTL:
(insn 6 3 7 2 (set (reg:DI 0 ax [orig:82 D.3252 ] [82])
        (const_int 33 [0x21])) "vtest.cc":40:24 80 {*movdi_internal}
     (nil))


For some unknown reason there are no OFFSET_REF & PTRMEM_CST in RTL

Комментариев нет:

Отправить комментарий