суббота, 16 ноября 2024 г.

perl module for powerpc disasm

It seems that there are no good open-source disasm for PPC except Capstone. And it even has perl binding - unfortunately it can extract only basic fields like opcode and text but no operands for specific processor. So I made yet another

I would like to make a few critical remarks about the capstone itself

1) it's fat as pig - with default settings we have
ls -l libcapstone.a
-rw-rw-r-- 1 redp redp 93833624 nov  8 16:10 libcapstone.a
Sure this can be fixed with selecting just needed processors but anyway makes a lasting impression

2) it's inconsistent. Just couple of examples
mr r31, r3 actually has instruction id PPC_INSN_OR. to get MR you must check alias ID, and even in this case they have PPC_INS_ALIAS_MR & PPC_INS_ALIAS_MR_

Another example - ld r3, something actually has PPC_REG_X3 instead of PPC_REG_R3 despite the fact that these are the same register. Why clone lots of registers if you produce the same output for them? And why not add size of register? I suspect this happens bcs they used MD from llvm and was too lazy to make some optimizations

3) it's incomplete. For example they don't implemented reg_access for powerpc (as well as for alpha, risc-v, sparc etc)

multiple TOCs

It's critical to track TOC for distinguishing loading some constant vs loading at some address. Lets check following code

; prologue
addis r2, r12, 0x1d8
addi r2, r2, 0x70 ; TOC
...
addis r3, r2, 0x19
ld r3, -0x1ca8(r3) ; kmem_cache

Here r3 adjusting from r2 holding address of TOC to get address of kmem_cache. And this is how loading of constant looks like:
lis r10, 0xa9
ori r10, r10, 0xc00 ; A90C00
And yes - LIS is again alias PPC_INS_ALIAS_LIS, instruction ID is PPC_ADDIS. 

Official doc is very unclear regarding the possibility of having multiple TOCs. Theoretically if linker would put TOC somewhere in middle of .bss/.data/.rdata sections it can cover full 32bit address space. But what if the program has size bigger 4Gb or it was linked with code from several compilers like llvm & gcc? Unfortunately I don't have such samples so cannot say if my disasm will work correctly in case of having several TOCs

results

Anyway, enjoy - powepc disassembler in 120LOC

пятница, 1 ноября 2024 г.

perl module for DWARF debug info parsing

I've made perl binding of my c++ dwarf dumper. Supports 64/32 bit, no DWO, don't sure if it can load arbitrary object files but at least can parse LKM - I even added relocations processing
Maybe one day I'll have the courage to write documentation for it, so for now just several samples of using it:
  • script to dump enums
  • script to find all functions having arguments with pointer or reference to some named structure
  • script to find and dump structures fields
  • script to find all classes with VTBL and dump them

So now I have in perl full set of modules to disasm/analyse elf files: