понедельник, 14 августа 2023 г.

gcc plugin to collect cross-references, part 4

Let`s apply priceless knowledge from previous part - for example to extract string literals and insert polymorphic decryption
Typical call to printf/printk in RTL looks usually like

(insn 57 56 58 9 (set (reg:DI 5 di)
        (symbol_ref/f:DI ("*.LC0") [flags 0x2] <var_decl 0x7f480e9edea0 *.LC0>)) "swtest.c":17:7 80 {*movdi_internal} 

(call_insn 59 58 191 9 (set (reg:SI 0 ax)
        (call (mem:QI (symbol_ref:DI ("printf") [flags 0x41] <function_decl 0x7f480e8c7000 printf>) [0 __builtin_printf S1 A8])
            (const_int 0 [0]))) "swtest.c":17:7 909 {*call_value}
     (nil)
    (expr_list (use (reg:QI 0 ax))
        (expr_list:DI (use (reg:DI 5 di))
            (nil))))

Translation for mere mortals
First instruction sets register 5 with address (via symbol_ref) of some known symbol with cool name "*.LC0"
Second instruction calls another known symbol "printf", arguments for this call stored in expression list - it use register 0 as result and argument in another nested expression list - early loaded register 5
 
So we can record in cross-references database 2 items for this function - loading of some symbol and call. Sadly name "*.LC0" is probably totally useless. Lets check if we can go deeper

We can extract TREE item for symbol_ref rtl with SYMBOL_REF_DECL and check it`s type - for some variable type is VAR_DECL
Then we can check if this variable is initialized via DECL_INITIAL
String literal has type STRING_CST 
And finally we can get content of this literal with TREE_STRING_POINTER and length with TREE_STRING_LENGTH 
 
I implemented this logic in method is_cliteral
Also for storing literals I add new virtual method into persistence interface 
 
You can ask - is it possible to extract integer constants like sizeof? Unfortunately no, mainly bcs they was converted to RTL const_int during expressions evaluation and RTL does not have tracking mechanism why does some const_int have this value. Typical use of sizeof may looks like:

... =(some_struct *)kmalloc(sizeof(some_struct) + strlen(s) + 1, GFP_KERNEL);

In RTL this code will be converted to call of strlen, then add to result in register some const_int (value is evaluated expression 1 + sizeof(some_struct)) and then passing it in register or stack into kmalloc via expr_list. Btw second argument also will be passed via const_int so it's impossible to recover value back to GFP_KERNEL constant

Комментариев нет:

Отправить комментарий