суббота, 16 ноября 2024 г.

perl module for powerpc disasm

It seems that there are no good open-source disasm for PPC except Capstone. And it even has perl binding - unfortunately it can extract only basic fields like opcode and text but no operands for specific processor. So I made yet another

I would like to make a few critical remarks about the capstone itself

1) it's fat as pig - with default settings we have
ls -l libcapstone.a
-rw-rw-r-- 1 redp redp 93833624 nov  8 16:10 libcapstone.a
Sure this can be fixed with selecting just needed processors but anyway makes a lasting impression

2) it's inconsistent. Just couple of examples
mr r31, r3 actually has instruction id PPC_INSN_OR. to get MR you must check alias ID, and even in this case they have PPC_INS_ALIAS_MR & PPC_INS_ALIAS_MR_

Another example - ld r3, something actually has PPC_REG_X3 instead of PPC_REG_R3 despite the fact that these are the same register. Why clone lots of registers if you produce the same output for them? And why not add size of register? I suspect this happens bcs they used MD from llvm and was too lazy to make some optimizations

3) it's incomplete. For example they don't implemented reg_access for powerpc (as well as for alpha, risc-v, sparc etc)

multiple TOCs

It's critical to track TOC for distinguishing loading some constant vs loading at some address. Lets check following code

; prologue
addis r2, r12, 0x1d8
addi r2, r2, 0x70 ; TOC
...
addis r3, r2, 0x19
ld r3, -0x1ca8(r3) ; kmem_cache

Here r3 adjusting from r2 holding address of TOC to get address of kmem_cache. And this is how loading of constant looks like:
lis r10, 0xa9
ori r10, r10, 0xc00 ; A90C00
And yes - LIS is again alias PPC_INS_ALIAS_LIS, instruction ID is PPC_ADDIS. 

Official doc is very unclear regarding the possibility of having multiple TOCs. Theoretically if linker would put TOC somewhere in middle of .bss/.data/.rdata sections it can cover full 32bit address space. But what if the program has size bigger 4Gb or it was linked with code from several compilers like llvm & gcc? Unfortunately I don't have such samples so cannot say if my disasm will work correctly in case of having several TOCs

results

Anyway, enjoy - powepc disassembler in 120LOC

пятница, 1 ноября 2024 г.

perl module for DWARF debug info parsing

I've made perl binding of my c++ dwarf dumper. Supports 64/32 bit, no DWO, don't sure if it can load arbitrary object files but at least can parse LKM - I even added relocations processing
Maybe one day I'll have the courage to write documentation for it, so for now just several samples of using it:
  • script to dump enums
  • script to find all functions having arguments with pointer or reference to some named structure
  • script to find and dump structures fields
  • script to find all classes with VTBL and dump them

So now I have in perl full set of modules to disasm/analyse elf files:

воскресенье, 20 октября 2024 г.

binding c++ objects to perl

Sure there are lots of ways to do this (as usually in Perl: "there's more than one way to do it"), just to name few:

  • swig - result looks not native and clumsy, also you need to make facade-like interfaces for really complex classes
  • ffi - also has problems with complex classes
  • XS - official and wide-spread way. Btw smoothest and most seamless Win32::OLE module uses it exactly

and besides you can simple call perl functions directly from c++ code (with all bells and whistles "dSP; ENTER; SAVETMPS" etc) like polymake do

So this article is just remaining for myself how to bind small-sized c++ classes to perl using XS. Lets begin with simplest one - just perl wrapper around Interval-Tree

  • to create new skeleton: h2xs -cfn Module::Name
  • patch Makefile.PL to add XSOPT="-C++" and change C compiler CC="g++", also highly likely we should add -lstdc++ to LIBS
  • don't forget to add all source/header files to MANIFEST

Code is simple, I prefer to make my own MAGIC instead of implementation DESTROY method - you will see further why

четверг, 3 октября 2024 г.

TLS in gcc RTL

Lets check how TLS looks like in RTL. I wrote simple test:
(insn 29 8 10 3 (set (reg:SI 0 ax [orig:82 _1 ] [82])
        (mem/c:SI (const:DI (unspec:DI [
                        (symbol_ref:DI ("tls_state") [flags 0x2a]  <var_decl 0x7f61bce052d0 tls_state>)
                    ] UNSPEC_NTPOFF)) [2 tls_state.idx+0 S4 A32 AS1])) "stest.cc":15:37 81 {*movsi_internal}
     (nil))

We can see attribute UNSPEC here - it can be extracted as XINT (in_rtx, 1) for GET_CODE == UNSPEC. The first problem here is that values UNSPEC_XXX are machine-specific. For example the same code while cross-compiling for mips:
insn 60 13 61 (set (reg:SI 3 $3)
        (unspec:SI [
                (const_int 0 [0])
            ] UNSPEC_TLS_GET_TP)) "stest.cc":15:37 706 {*tls_get_tp_si_split}
     (nil))
(insn 61 60 15 (set (reg:SI 2 $2 [201])
        (reg:SI 3 $3)) "stest.cc":15:37 313 {*movsi_internal}
     (nil))
(insn 15 61 16 (set (reg:SI 3 $3 [202])
        (high:SI (const:SI (unspec:SI [
                        (symbol_ref:SI ("tls_state") [flags 0x2a]  <var_decl 0x7ff96a9dfcf0 tls_state>)
                    ] 306)))) "stest.cc":15:37 313 {*movsi_internal}
     (nil))

Second - NTPOFF in comments marked as belonging to "Relocation specifiers" while 

;; TLS support
  UNSPEC_TP
  UNSPEC_TLS_GD
  UNSPEC_TLS_LD_BASE
  UNSPEC_TLSDESC
  UNSPEC_TLS_IE_SUN

Thirdly - lets recompile the same file with -fPIC option:
(call_insn/u 9 8 11 3 (parallel [
            (set (reg:DI 0 ax)
                (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")) [0  S1 A8])
                    (const_int 0 [0])))
            (unspec:DI [
                    (symbol_ref:DI ("tls_state") [flags 0x10]  <var_decl 0x7fcea13c92d0 tls_state>)
                    (reg/f:DI 7 sp)
                ] UNSPEC_TLS_GD)
        ]) "stest.cc":15:37 1048 {*tls_global_dynamic_64_di}
     (expr_list:REG_EH_REGION (const_int -2147483648 [0xffffffff80000000])
        (nil))
    (nil))
(insn 11 9 46 3 (set (reg:SI 0 ax [orig:82 _1 ] [82])
        (mem/c:SI (reg/f:DI 0 ax [88]) [2 tls_state.idx+0 S4 A32])) "stest.cc":15:37 81 {*movsi_internal}
     (nil))

You can notice that address of TLS var gathered with __tls_get_addr but next access to it not marked in any way - just regular chain set mem component_ref. Disgusting

Just note for myself what else cannot be extracted from gcc RTL:

  • pointer to members -
     despite the fact that in file cp/cp-tree.def there are OFFSET_REF & PTRMEM_CST in real RTL they are just integer constants
  • pointer to member methods - similalry
  • TLS are indistinguishable from regular global vars
  • to be continued

понедельник, 30 сентября 2024 г.

gcc plugin to collect cross-references, part 9

Lets extract some useful results from my gcc plugin for collecting cross-references: 1, 2, 3, 4, 5, 6, 7 & 8

I've noticed that plugin worked unbearably slowly on big source files (like compiling itself) - something above 30 minutes, so I add profiling to it (see details here). Profiler showed that consumed user-time was only 18 seconds - this is glare sign that plugin spending most of time on some kind of lock. After some meditation (and strace/ltrace) I finally found that root of problem was in sqlite - it sleeping in fdatasync on each data writing to its DB! The cure is to execute
PRAGMA synchronous=OFF
this gives a non-illusory x100 speed up. Can I now consider myself as one of mythical "x100 programmer", he-he?

The next problem is how to store data from multiple gcc processes into single sqlite DB (like parallel make -j X). I've decided to use Apache Thrift - old and unfashionable but it really works and you can make multithreaded RPC server in 500 lines of c++ code. Server just listen on some port and stores all data from RPC clients into sqlite DB. Command line arguments path2db port_number .To shutdown RPC server use rpcq port_number. Of course you can make your own implementation of RPC server, for example with PHP & DB/2 (unfortunately Thrift does not supports Cobol)

As a consequence now plugin has 3 implementations for data storing:

  1. in plain text files, self-test takes 14s
  2. into sqlite, self-test takes 18s
  3. in sqlite via RPC server listened on some arbitrary TCP port. Self-test takes 23s. Connection string looks like -fplugin-arg-gptest-db=localhost:port_number 

Btw sqlite version has size 2.67Mb, Thrift rpc client - 4.65Mb. Something is fundamentally wrong with "modern frameworks"

DB schema 

is very simple - in essence it consists of two tables:

symtab for storing symbols

  • id - primary key
  • mangled name of symbol/function or value of literal
  • fname - for functions this is name of file where they were declared
  • bcount - amount of basic blocks of function, can be used as some complexity metric

xrefs to link symbols and functions from which they are referred

  • id of function
  • bb - index of basic block
  • arg - if non-zero - index of function's argument
  • what - id of referred symbol
  • kind - one letter type of xref:
    • 'c' - function call
    • 'v' - call of virtual method
    • 'l' - literal
    • 'r' - just reference to some global symbol
    • 'f' - field of some struct/class/union
    • 'F' - constant (with -fplugin-arg-gptest-ic option)

Fetching results

I wrote perl script to search and dump xrefs. If I want to find functions calling virtual methods of class FPersistence:

пятница, 20 сентября 2024 г.

Tracking arguments of functions in gcc

I continuing to improve my gcc plugin for collecting cross-references: 1, 2, 3, 4, 5, 6 & 7. On this week I decided to see if I can extract source of complex types like records and most prominent kind of them is arguments of function - they are easy to identify in asm (but not so easy to bind them in gcc RTL expressions)

Having function declaration fdecl we can extract arguments with something like:

for (tree arg = DECL_ARGUMENTS (fdecl); arg; arg = DECL_CHAIN (arg))
  {
    auto a = DECL_RTL_IF_SET (arg);
    if ( a && REG_EXPR(a) ) { // do something with argument in
REG_EXPR(a)

 
I need only arguments that are a pointer or reference to record/union so I filtered them in method check_arg

Those was easiest part, and now try to find how arguments can be tracked in RTL. 

воскресенье, 15 сентября 2024 г.

bug in gcc?

It seems that gcc not always put COMPONENT_REF when access fields of structures passed by reference. For example I add today simple static function append_name(aux_type_clutch &clutch, const char *name)

It has reference to field txt of structure aux_type_clutch but RTL looks like:

(insn 20 19 21 4 (set (reg/f:DI 0 ax [89])
        (mem/f/c:DI (plus:DI (reg/f:DI 6 bp)
                (const_int -8 [0xfffffffffffffff8])) [258 clutch+0 S8 A64])) "gptest.cpp":1364:24 80 {*movdi_internal}
     (nil))
(insn 21 20 22 4 (parallel [
            (set (reg/f:DI 0 ax [orig:83 _2 ] [83])
                (plus:DI (reg/f:DI 0 ax [89])
                    (const_int 8 [0x8])))
            (clobber (reg:CC 17 flags))
        ]) "gptest.cpp":1364:24 230 {*adddi_1}
     (expr_list:REG_EQUAL (plus:DI (mem/f/c:DI (plus:DI (reg/f:DI 19 frame)
                    (const_int -8 [0xfffffffffffffff8])) [258 clutch+0 S8 A64])
            (const_int 8 [0x8]))
        (nil)))

First instruction just loads in register RAX parm_decl (of type aux_type_clutch) like

mov     rax, [rbp+clutch]

and second add to RAX just some const 0x8 (offset to field txt):

add     rax, 8

it's impossible from RTL to track back this constant to offset in COMPONENT_REF

What is more even strange - for methods you can track fields access for parameters passed by reference (like this) - for example in constructor of the same aux_type_clutch:

(insn 12 11 13 2 (set (mem:SI (plus:DI (reg/f:DI 0 ax [94])
                (const_int 40 [0x28])) [4 this_12(D)->level+0 S4 A64])
        (const_int 0 [0])) "gptest.cpp":465:4 81 {*movsi_internal}
     (nil))