суббота, 4 декабря 2021 г.

overhead of eBPF JIT

Lets try to estimate overhead of JIT compiler

I wrote simple perl script - it just counts redundant bytes for several cases:

  • pair mov reg, rbp/add reg, imm (total length 7 bytes) can be replaced with lea reg, [rbp-imm] which is only 4 bytes
  • pair mov reg, imm/add reg,imm can be replaced with just loading of right address so second instruction can be removed
  • add reg, 1/sub reg, 1 (length 4 bytes) can be replaced to inc/dec reg (which has length 3 bytes)
etc etc
Results

среда, 1 декабря 2021 г.

jitted eBPF code

I add yesterday disasm for jitted eBPF code. To put it mildly this code is very poor

Every function has 7 bytes of nops in prolog. Comment says that this is for BPF trampoline - well, ok

Lots of code like

 mov eax, 0x1
 cmp r14, 0x2
 jnz 0xc0561497
 xor eax, eax
0xc0561497:
 ...
Somebody - tell them about cmovXX instructions

Lots of code like
mov rdi, 0xffff8fd687f3e000
add rdi, 0x110

and related to get addresses of stack var:
mov rdi, rbp
add rdi, 0xffffffffffffffe0 
Perhaps it would be preferable to use lea rdi, [rbp-XX]

Slow inc/dec:
add r8, 0x1
sub rdi, 0x1

Lots of repeated instructions:
and rdi, 0xfff
and rdi, 0xfff
it's obvious bug

And finally

четверг, 25 ноября 2021 г.

eBPF on cgroups

the long story short - they are stored in array effective and in list progs in cgroup->bpf
Below I will try to explain boring and dirty details

cgroups

This article says:
hierarchy: a set of cgroups arranged in a tree
so we need to find roots and then just traverse this trees. Roots have type cgroup_root and stored in cgroup_hierarchy_idr (synced with mutex cgroup_mutex). As usually linux lies - lets compare content of  /proc/cgroups:
#subsys_name hierarchy num_cgroups enabled
cpuset 6 1 1
cpu 5 1 1
cpuacct 5 1 1
blkio 4 1 1
memory 2 148 1
devices 9 99 1
freezer 10 1 1
net_cls 7 1 1
perf_event 8 1 1
net_prio 7 1 1
hugetlb 3 1 1
pids 11 103 1
rdma 12 1 1

with what cgroup roots actually located on this machine:

[0]  at 0xffffffff8e9a2200 flags 8 hierarchy_id 0 nr_cgrps 145 real_cnt 144
[1] systemd at 0xffff8fd6816ea000 flags 4 hierarchy_id 1 nr_cgrps 145 real_cnt 144
[2]  at 0xffff8fd68297a000 flags 0 hierarchy_id 2 nr_cgrps 148 real_cnt 147
[3]  at 0xffff8fd68297c000 flags 0 hierarchy_id 3 nr_cgrps 1 real_cnt 0
[4]  at 0xffff8fd682978000 flags 0 hierarchy_id 4 nr_cgrps 1 real_cnt 0
[5]  at 0xffff8fd68297e000 flags 0 hierarchy_id 5 nr_cgrps 1 real_cnt 0
[6]  at 0xffff8fd6854c8000 flags 0 hierarchy_id 6 nr_cgrps 1 real_cnt 0
[7]  at 0xffff8fd6854ce000 flags 0 hierarchy_id 7 nr_cgrps 1 real_cnt 0
[8]  at 0xffff8fd6854ca000 flags 0 hierarchy_id 8 nr_cgrps 1 real_cnt 0
[9]  at 0xffff8fd6854cc000 flags 0 hierarchy_id 9 nr_cgrps 99 real_cnt 98
[10]  at 0xffff8fd685e16000 flags 0 hierarchy_id 10 nr_cgrps 1 real_cnt 0
[11]  at 0xffff8fd685e12000 flags 0 hierarchy_id 11 nr_cgrps 103 real_cnt 102
[12]  at 0xffff8fd685e14000 flags 0 hierarchy_id 12 nr_cgrps 1 real_cnt 0

can you find in /proc/cgroups roots with hierarchy ID 0 and 1?

How to traverse this tree? It starts in field cgrp->self and we can use  functions css_next_descendant_pre/css_next_descendant_post etc. Strictly speaking they return pointer to cgroup_subsys_state but this is first field self  in cgroup so casting is safe

четверг, 11 ноября 2021 г.

slides from our talk at Black Hat EU 2021

link

and some

afterword

all presented attacks caused by misuse of Windows logging mechanism for ETW-based EDRs. And I see bad sign when the same thing happens with eBPF on Linux. So who knows - maybe my next paper will be called "blinding eBPF-based EDRs on Linux" :-)

пятница, 15 октября 2021 г.

blinding sysmon for linux

 Let`s see which tracepoints it using:


sudo ./lkmem -d -c -t ~/krnl/curr ~/krnl/System.map-5.11.0-37-generic
 __tracepoint_sched_process_exit at 0xffffffffa47140c0: enabled 1 cnt 1
  [0] 0xffffffffa2ed3b40 - kernel!perf_trace_sched_process_template
 __tracepoint_sys_exit at 0xffffffffa4714ae0: enabled 1 cnt 1
  regfunc: 0xffffffffa2fa3350 - kernel!syscall_regfunc
  unregfunc: 0xffffffffa2fa3410 - kernel!syscall_unregfunc
  [0] 0xffffffffa2f37f90 - kernel!__bpf_trace_sys_exit
 __tracepoint_sys_enter at 0xffffffffa4714b40: enabled 1 cnt 1
  regfunc: 0xffffffffa2fa3350 - kernel!syscall_regfunc
  unregfunc: 0xffffffffa2fa3410 - kernel!syscall_unregfunc
  [0] 0xffffffffa2f37e30 - kernel!__bpf_trace_sys_enter

  1. my favorite 1bit patch - zero tracepoint->key.enabled
  2. remove BPF client from funcs list
  3. find trace_event_call and install your own event_filter

понедельник, 11 октября 2021 г.

BPF iterators

Sure I could not get past the hype topic of BPF (overvalued mechanism to allow you just run your buggy code in kernel with low performance and lots of overhead). For access of some kernel data they add so called iterators - and maybe you even can write your own and register it with bpf_iter_reg_target (spoiler: you can`t, bcs this function is not exported. Welcome to wonderful world of open-source with unexplained and unreasonable restrictions). I was curious what BPF iterators are in the system - they stored iterators in list targets synchronized with mutex targets_mutex. It would seem what could go wrong? 

grep " targets" System.map-5.11.0-37-generic
ffffffff820ff8e0 r targets
ffffffff826e1240 d targets_mutex
ffffffff826e1260 d targets
ffffffff8277a5c0 d targets
ffffffff8286b2e8 d targets_supported

In this case, we are dealing with another mechanism for hiding information in linux kernel - using of non-unique names. I was not even lazy and wrote a script to count such names - 998 names. Top 5:

_acpi_module_name: 155
cpumask_weight.constprop.0: 47
kzalloc.constprop.0: 39
get_order: 32
kmalloc_array.constprop.0: 28

As usual the disassembler rushes to the rescue

понедельник, 4 октября 2021 г.

security hooks in linux kernel

This mechanism was inspired by NSA. As described all hooks stored in huge struct security_hooks_list, but it`s format is different in each version. We can determine which list belongs to what hook with disasm magic. Lets see function that calls security hooks - for example security_path_chown:

.text:FFFFFFC010496448 security_path_chown        ; CODE XREF: chown_common+104↑p
.text:FFFFFFC010496448   STP             X29, X30, [SP,#-0x18+var_18]!
.text:FFFFFFC01049644C   MOV             X29, SP
.text:FFFFFFC010496450   STP             X20, X21, [SP,#0x18+var_s0]
.text:FFFFFFC010496454   STR             X22, [SP,#0x18+var_s10]
.text:FFFFFFC010496458   MOV             X20, path
.text:FFFFFFC01049645C   MOV             W21, W1
.text:FFFFFFC010496460   MOV             path, X30
.text:FFFFFFC010496464   MOV             W22, W2
.text:FFFFFFC010496468 loc_FFFFFFC010496468  ; DATA XREF: .init.data:FFFFFFC0111474C0↓o
.text:FFFFFFC010496468   BL              _mcount
.text:FFFFFFC01049646C   LDR             X0, [path,#8]
.text:FFFFFFC010496470   LDR             X0, [X0,#0x30]
.text:FFFFFFC010496474   LDR             W0, [X0,#0xC]
.text:FFFFFFC010496478   TBNZ            W0, #9, loc_FFFFFFC0104964B4
.text:FFFFFFC01049647C   ADRP            X0, #security_hook_heads_0.path_chown@PAGE
.text:FFFFFFC010496480   STR             X19, [X29,#0x18+var_8]
.text:FFFFFFC010496484   LDR             X19, [X0,#security_hook_heads_0.path_chown@PAGEOFF]


In disasm we just search for first reference to memory near address of security_hook_heads. Some results:

воскресенье, 3 октября 2021 г.

what linux hiding

disclaimer
there is no doubt that the list below is incomplete, inaccurate etc - it`s just what very average programmer can find during two month of browsing linux source code

observability criteria
what I mean under "hiding"? It means that
  • no kernel API to enumerate some structure
  • no real-time notifications about setting some hook
  • no mapping on /proc or /sys (however this method is not reliable)
  • no 3rd party tools to show this. As an example I chose volatility - just bcs I readed their folio "The Art of Memory Forensics"
So you unable to see them

notification chains
very ironic that they have API like register_XXX_notifier/unregister_XXX_notifier and there is no function like enum_XXX_notifier
no mapping on /proc or /sys
volatility checks only very limited set - vt_notifier_list & keyboard_notifier_list

tracepoints
no API to enum clients
no notification about turning on some tracepoint
has mapping to /sys/kernel/tracing/events but can`t show clients of some tracepoint
volatility - no

kprobes
no API to enum consumers of some installed KPROBE
no notification about installing new kprobe. This is an extremely sad fact - for example tools like LKRG don`t knows that some memory was patched
has mapping to /sys/kernel/debug/kprobes/
volatility - no

uprobes
no API to enum consumers of some installed UPROBE
no notification about installing new uprobe
has mapping to /sys/kernel/debug/tracing/uprobe_events. Most crazy thing is that uprobes installed from kernel not shown
volatility - no

filesystem notifications
no API to enum all installed marks
for usermode events has notification via security_path_notify, for kernelmode - absolutely not
has very limited mapping to /proc/*/fdinfo/*. Again marks installed from kernel not shown
volatility - no

вторник, 28 сентября 2021 г.

PoC to hide kprobes list

as you may know list of kprobes has mapping on /sys in file /sys/kernel/debug/kprobes/list. And now when I have working filesystem notifications it would be extremely tempting try to make hiding content of this file. Let`s see what this inode contains:


sudo ./lkmem -s -c ~/krnl/curr ~/krnl/System.map-5.11.0-34- generic /sys/kernel/debug/kprobes/list 
res /sys/kernel/debug/kprobes/list: (nil)
 inode: 0xffff8a0448d1ae40
 s_op: 0xffffffffa5067f80 - kernel!debugfs_super_operations
 inode->i_fop: 0xffffffffa506b000 - kernel!debugfs_full_proxy_file_operations
 debugfs_real_fops: 0xffffffffa5028ce0 - kernel!kprobes_fops
 private_data: 0xffffffffa5028e00 - kernel!kprobes_sops

kprobes_sops is just struct seq_operations and the function we need is show. So idea is simple
  • set notification for file /sys/kernel/debug/kprobes/list
  • in fsnotify_handle_event callback check inode and mask
  • if this is first opening of this file - patch kprobes_sops->show to our own function (be cautious with WP in cr0)
  • if this is last closing of this file - return original handler to kprobes_sops->show
  • also return original handler when driver is unloading
You may ask - why is it so difficult? It`s much easier just to patch kprobes_sops->show, right? The answer is that you minimize the risk of being discovered when patching only for some short period 

воскресенье, 26 сентября 2021 г.

filesystem notifications in linux kernel

disclaimer

Filesystems are the most complex part of any OS. I am not a specialist in linux filesystems and even don`t commit the code to linux kernel. So all information here cannot be considered reliable, code has tons of bugs and can damage your machine and ruin the rest of your life

Usermode notifications

linux has 3 mechanisms for passing filesystem notification to user-mode:

  1. dnotify
  2. inotify
  3. new-fashioned fanotify
all they projected to used-mode as file (analogue of FilterConnectionPorts), so you can use lsof (or even just something like "ls /proc/*/fdinfo/* | xargs grep notify") to find then and what processes do they belong to. Unfortunately (as usually) this information is not enough. Let see for example function fanotify_fdinfo. We can notice that there are 3 possible source of notifications:
  1. just simple inode - for them dumped inode->i_ino & superblock s_dev - I don`t know how you can in usermode find mountpoint for this superblock
  2. mount point (btw struct mount even not described in linux/include). At least knowing mnt_id you can find name in /proc/pid/mountinfo file
  3. superblock - s_dev again dumping
Can you have real-time notifications about setting new xxnotify? Yes, via security_path_notify. At the same time no notifications about removing

Kernel mode notifications
Can you have the same functionality provided by xxnotify in kernel mode? Definitely yes - kernel audit uses it. I could not find any sample code to do this in your own driver so I wrote one. This is not very complex (although function fsnotify_destroy_group is not exported so you need some sort of kallsyms lookup). You can add to tracked_inode everything you want - like full filename, stat etc

And now the main question
Can you find all sources of filesystem notification?

суббота, 18 сентября 2021 г.

linux kernel uprobes

Lets consider another spying mechanism in linux kernel - uprobes. They also insert int3 but this time in user-mode and can be used for example to steal TLS traffic. I made simple code to set up uprobe for /usr/bin/ls on PLT thunk getenv:

objdump -d /usr/bin/ls
...
0000000000004710 <getenv@plt>:
    4710: f3 0f 1e fa          endbr64 
    4714: f2 ff 25 5d e5 01 00 bnd jmpq *0x1e55d(%rip)        # 22c78 <getenv@GLIBC_2.2.5>
    471b: 0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)

now run ls
ls -i /usr/bin/ls
1043126 /usr/bin/ls 
dmesg | tail
[258600.533089] uprobe ret_handler is executed, ip = 55EAECA62B54
[258600.533090] uprobe handler in PID 43831 executed, ip = 55eaeca56710
[258600.533093] uprobe ret_handler is executed, ip = 55EAECA62B6C
[258600.533095] uprobe handler in PID 43831 executed, ip = 55eaeca56710
[258600.533098] uprobe ret_handler is executed, ip = 55EAECA5861C
[258600.533099] uprobe handler in PID 43831 executed, ip = 55eaeca56710
[258600.533102] uprobe ret_handler is executed, ip = 55EAECA57F60
[258600.533111] uprobe handler in PID 43831 executed, ip = 55eaeca56710
[258600.533114] uprobe ret_handler is executed, ip = 55EAECA57A77

And you can`t see which uprobes are installed - file /sys/kernel/debug/tracing/uprobe_events is empty. NSA can hide their anal catheters even in opened sources, yeah. So I wrote code to dump all uprobes (stored in uprobes_tree) and consumers of each uprobe

четверг, 9 сентября 2021 г.

linux kernel kprobes

without a doubt most crazy and insane spying mechanism in linux kernel is krobes

  1. It`s expensive - each time when int3 occurred typical call stack looks like:
    xen_asm_exc_int3
    asm_exc_int3
    irq_entries_start
    exc_int3
    do_int3
    kprobe_int3_handler
  2. It makes working with kdbg (which itself is too far away from windbg) like nightmare - function do_int3 first calls kgdb_ll_trap
  3. There is no mechanism to predict which functions cannot be kprobed. Let assume that your handler uses simple printk - so you can`t set kprobe on whole graph of functions called from printk (like vprintk_func, vprintk_default, vprintk_emit, __msecs_to_jiffies, arch_touch_nmi_watchdog, touch_softlockup_watchdog, __printk_safe_enter, _raw_spin_lock, vprintk_store, vscnprintf, cont_flush etc etc) and as far I know there is no way to even find them all
  4. Sure you have /sys/kernel/debug/kprobes/list file so you can see which functions was hooked. But there is no way to know by whom
So I wrote dumper of installed kprobes. Sample of output:

sudo ./lkmem -k -c ~/krnl/curr ~/krnl/System.map-5.11.0-34-generic
kprobes[47]: 1
 kprobe at 0xffffffffc0605080 flags 8
  addr: 0xffffffffa4a9f040 - kernel!__do_sys_fork
  pre_handler: 0xffffffffc0603548 - lkcd
  post_handler: 0xffffffffc0603526 - lkcd

понедельник, 6 сентября 2021 г.

linux-kernel per-cpu vars

It`s hard to believe but linux has degraded version of KPCR on windows - so called "per-cpu variables". This is some isolated memory assigned to CPU (stored in gs segment register on x64 and in MSR register c13 on arm64) and can contains some interesting fields. Why this is important to know offsets some of this variables? Well, I suspect that linux kernel contains much more code for espionage than windows (for example trace events, tracepoints, kprobes, usb_mon_register etc etc). One of such code is function user_return_notifier_register with which you can register your own notifications. Unfortunately this list of notifications stored in per-cpu variable return_notifier_list

And as usually there is no some include file with definition of all of this per-cpu fields. Moreover this offsets depend from config for kernel building and differ in each build. Sounds like nightmare, reason to turn off the computer and go drink vodka looking at the autumn rain.

Or not? Lets see in disasm some functions using this var - like fire_user_return_notifiers:
fire_user_return_notifiers proc near
 call    __fentry__ ; another entry for spy code
 mov     rax, offset unk_29450
 add     rax, gs:this_cpu_off ; .data..percpu:0000000000011368
 mov     rdi, [rax]

In this build return_notifier_list happens to have offset 0x29450 and this_cpu_off 0x11368. 
Well, we can use disasm to get offsets to both return_notifier_list & this_cpu_off and then write code like:
; rdi - this_cpu_off
; rsi - offset
get_this_gs:
mov rax, [gs:rdi]
add rax, rsi
ret

Patch on github to extract this_cpu_off & return_notifier_list with some disasm magic

суббота, 28 августа 2021 г.

linux kernel tracing

It`s hard to believe but linux kernel has almost exact copy of windows ETW - event tracing. It is just as difficult to make it work, it is poorly documented, very complex and fragile. And yes, as you can guess - it also can`t show who and which parts of it in use. So I wrote some code to dump registered funcs in tracepoints and to check file ops for files in /sys/kernel/tracing/events

Lets start with tracepoints. As you see this structure has strange looked list of functions in field funcs, and calling happens in functions like event_triggers_call. How we can find this tracepoints? Well,  they stored in trace_event_call->tp and array of pointers to trace_event_call located between symbols __start_ftrace_events__stop_ftrace_events. Unfortunately all this treasures located in discardable section .init.data. But because they were all declared in the same manner we can find them by name - all symbols with prefix __tracepoint_ is what we need. So some examples (you can run lkmem -c -t vmlinux system.map to get this):

 __tracepoint_sys_enter at 0xffffffff8b82e340: enabled 0 cnt 0
  regfunc 0xffffffff8a192330 - kernel!syscall_regfunc
  unregfunc 0xffffffff8a1923f0 - kernel!syscall_unregfunc


Well, no clients right now - cnt 0

Next about /sys/kernel/tracing/events files (this is perverted inhuman interface to manage trace events). I just dumping file->f_path.dentry->d_inode->i_fop for each such file. Sample of output (you can achieve this with lkmem -s vmlinux system.map path_to_some_sys_kernel_tracing_file):

пятница, 27 августа 2021 г.

arm64 disasm for linux kernel

I added today disassembler for arm64 linux kernel to search pointers. It turned out to be surprisingly difficult to do for several reasons (disasm for x64 is only 383 LOC vs 618 for arm64)

One of them is poor code produced by some gcc versions

But the main problem is arm64 opcodes. Lets see simple indirect call:
  ADRP            X27, #mh_filter@PAGE
  CMP             W22, #0x3A ; ':'
  B.EQ            loc_FFFFFFC010CC7140
  CMP             W22, #0x87
  B.NE            loc_FFFFFFC010CC7188
  LDR             X2, [X27,#mh_filter@PAGEOFF]
  CBZ             X2, loc_FFFFFFC010CC7188
  MOV             X1, skb
  MOV             X0, X28
  BLR             X2
    

compare this with code to call list of funcs from tracepoints:
  ADRP            __data, #__tracepoint_cpu_idle@PAGE
  ADD             X0, X0, #__tracepoint_cpu_idle@PAGEOFF
  MOV             X29, SP
  STR             X19, [SP,#var_s10]
  LDR             X19, [X0,#(__tracepoint_powernv_throttle.funcs - 0xFFFFFFC011A562C0)]
 ...
loc_FFFFFFC01011FC60:
  LDR             X4, [X19]
  MOV             W3, W20
  LDR             X0, [X19,#8]
  MOV             X2, X21
  MOV             W1, W22
  BLR             X4
  LDR             X0, [X19,#0x18]!
  CBNZ            X0, loc_FFFFFFC01011FC60

In second case register X4 was loaded from X19, which in turn was loaded from some memory, so I need to track how many times content of register was loaded

Anyway results is +34 newly discovered functions pointers

понедельник, 23 августа 2021 г.

functions pointers in linux kernel data sections

I wrote simple program to estimate size of problem. Yes, I know about CFI but it seems that even on kernel 5.11 on fresh Ubuntu this mechanism is not implemented and indirect calls looks like:

  mov     rax, cs:XXX
  call    __x86_indirect_thunk_rax

__x86_indirect_thunk_rax proc near: 
  jmp     rax

First approach is just to scan .data section - you can do this running

./lkmem path-to-unpacked-kernel path-to-System.map

Some results:
  • arm64 5.11.0: 9893
  • x64 5.8-53: 10698
  • x64 5.11.0: 13414
  • x64 4.18: 16224
Ok, how about not yet inited pointers (or pointers in .bss section)? We need use disassembler - just disasm all functions in .text and find indirect calls and calls to __x86_indirect_thunk_XXX. Results (with -d option):
  • x64 4.18: +42
  • x64 5.8-53: +52
  • x64 5.11.0: +45
and with .bss section (option -b):
  • x64 4.18: +99
  • x64 5.8-53: +120
  • x64 5.11.0: +109

воскресенье, 15 августа 2021 г.

dumper of linux kernel notification chains

There seems to be one little-known thing in linux kernel - notification chains. So they have literal analogue of PsSetLoadImageNotifyRoutine - function register_module_notifier. And similarly they don't have a function to enumerate registered notifications - I don`t know why. Maybe they were bitten by Microsoft. Or maybe I want too much from people whose even "The Linux Kernel Module Programming Guide" contains an error in the code example. Anyway I decided to write my own (btw the last time I wrote drivers for Linux was something around 20 years ago)

How to run

git clone https://github.com/redplait/lkcd.git
cd lkcd
make
sudo insmod ./lkcd.ko
cd test
make
sudo ./dtest

Sample of output (from fresh Ubuntu):

четверг, 8 июля 2021 г.

codewars heisenbug

 I got following crash when tried to solve some trivial task:


UndefinedBehaviorSanitizer:DEADLYSIGNAL ==1==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x000000000020 (pc 0x0000004271a4 bp 0x7ffcf51d9a28 sp 0x7ffcf51d9070 T1) ==1==The signal is caused by a READ memory access. ==1==Hint: address points to the zero page. ==1==WARNING: invalid path to external symbolizer! ==1==WARNING: Failed to use and restart external symbolizer! #0 0x4271a3 (/workspace/test+0x4271a3) #1 0x4276d0 (/workspace/test+0x4276d0) #2 0x4273f0 (/workspace/test+0x4273f0) #3 0x427eed (/workspace/test+0x427eed) #4 0x4282b9 (/workspace/test+0x4282b9) #5 0x42abe3 (/workspace/test+0x42abe3) #6 0x4295ce (/workspace/test+0x4295ce) #7 0x429129 (/workspace/test+0x429129) #8 0x428d1b (/workspace/test+0x428d1b) #9 0x43b625 (/workspace/test+0x43b625) #10 0x42810d (/workspace/test+0x42810d) #11 0x7fdeedfdabf6 (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6) #12 0x405339 (/workspace/test+0x405339)

ok, nothing special, just dereferencing zero ptr. But where? code looks like:
double eval(const std::shared_ptr<ASTNode> &tree) {
    if ( !tree )
      return 0.0;
    switch(tree->token.type) {

At least we have address of crash - lets dump first 0x40 bytes of eval function:

   unsigned char *c = (unsigned char *)&eval;
   for ( int i = 0; i < 0x40; i++ )
     printf("%2.2X ", *(c + i));

And then put them in 64bit disasm:

000000000000000d 55               push rbp
000000000000000e 4157             push r15
0000000000000010 4156             push r14
0000000000000012 53               push rbx
0000000000000013 4883ec18         sub rsp, 0x18
0000000000000017 488b37           mov rsi, [rdi]
000000000000001a 660f57c0         xorpd xmm0, xmm0
000000000000001e 4885f6           test rsi, rsi
0000000000000021 0f849d020000     jz dword 0x2c4
0000000000000027 8b4620           mov eax, [rsi+0x20] ; crash is here

WHAT? how rsi can be zero if it passed check test rsi, rsi? Is it buggy qemu, docker or some speculative read-ahead or what is it? I doubt if I wish continue use this service

понедельник, 15 марта 2021 г.

ecdsa in driver

Lets assume that we have buggy and dangerous driver (which "rely on many unexported functions and select them via pattern scans which are regularly revalidated against windows insider builds", he-he). Sure we want restrict access to it, for example like ProcessHacker do

Unfortunately the latter uses CNG and cannot work on xp/w2k3. So I made fork of libecc to use this library with WDK7. Test driver and client also included

How to build user-mode part

I commited VS2017 project files for library, ec_utils and test client - they located in directory vs.
Next you must sign your client:

Generate your keys (constants BRAINPOOLP512R1, ECRDSA and SHA3_512 hardcoded in driver - sure you can use what you want):
ec_utils.exe gen_keys BRAINPOOLP512R1 ECRDSA mykeypair

and sign your client 
ec_utils.exe sign BRAINPOOLP512R1 ECRDSA SHA3_512 testclnt.exe mykeypair_private_key.bin testclnt.sig

now copy file mykeypair_public_key.h to directory drv
Also you need convert file testclnt.sig to 1.inc to driver source code - I am too lazy to read signatures from registry so they hardcoded in driver body

How to build driver

Launch right "Build Environment" from WDK7, Makefile for library located in directory src and Makefile for driver in directory drv. I hope you know what to do with them

Run

You will need admin privileges, at first install driver
testclnt.exe full_path2_ecdsadrv.sys
and just run
testclnt.exe

If you were careful enough with the signatures you can see something like:
IOCTL_TEST_IOCTL return 1

This means that driver checked EC DSA of your testclnt.exe and now agree to work with it. Sure you can have several trusted clients - just change ALLOWED_CLIENTS in vrfy.c and init each client with right signature

And finally when you have enough playing you can uninstall driver:

testclnt.exe -u

четверг, 18 февраля 2021 г.

poorgcc: IDA Pro plugin to fix poor gcc code on arm64

Lets see what generates gcc for arm64 - for example gcc7.5 and linux kernel
Function do_sysinstr:

ADRP            X0, #__func__.48604@PAGE ; "arm64_show_signal"
ADD             X0, X0, #__func__.48604@PAGEOFF
ADRP            X3, #ctr_read_handler@PAGE
ADD             X0, X0, #0x218
ADD             X3, X3, #ctr_read_handler@PAGEOFF

Wtf happened here? Instead of loading x0 with address of sys64_hooks we have two consecutive loads and no value x0 used between. You can peek some random functions - this is very common pattern, I personally think this is bug in gcc arm64 codegen. Anyway, it does not allow to see right xrefs so I wrote simple plugin for IDA Pro to fix this

Plugin just try to find instructions "add add reg, reg, imm" without data xref and backtrack if this register was loaded somewhere above - sure code is not sample of elegance. You can add to plugins.cfg string like this

process_all_poor_gcc_functions    poorgcc64     0      1

to process all functions

Some results - after applying plugin to function do_sysinstr code looks like:

ADRP            X0, #__func__.48604@PAGE ; "arm64_show_signal"
ADD             X0, X0, #__func__.48604@PAGEOFF
ADRP            X3, #ctr_read_handler@PAGE
ADD             X0, X0, #0x218 ; FFFFFFC010C116C8
ADD             X3, X3, #ctr_read_handler@PAGEOFF


FFFFFFC010C116C8 is address of sys64_hooks and now it has right xref

понедельник, 15 февраля 2021 г.

fsm rules for rpcrt4!GlobalRpcServer

I already described how you can extract address of GlobalRpcServer and offset to some RPC_SERVER_T fields. Lets do it for arm64 in declarative manner using FSM

Start again with I_RpcServerRegisterForwardFunction function - we can get address of RpcHasBeenInitialized (will be stored with index 1), GlobalRpcServer (with index 2) and RPC_SERVER_T.pRpcForwardFunction offset (with index 3):

section .data
func I_RpcServerRegisterForwardFunction
# 1 - RpcHasBeenInitialized
stg1 load
# 2 - GlobalRpcServer
stg2 load
# 3 - ForwardFunction offset
stg3 strx

Next we can get size of RPC_SERVER_T - from function InitializeRpcServer as argument to AllocWrapper. But InitializeRpcServer is surprisingly hard to find - it is not exported and called one time from InitializeServerDLL (which also non-exported). It using lots of unicode strings but unfortunately they all have common prefix "NT AUTHORITY" what makes them indistinguishable for signature 16 bytes. But you can notice that inside this function registering some RPC_SERVER_INTERFACE - so we can use its content as GUID: 

среда, 10 февраля 2021 г.

using FSM to recover struct fields offsets

In previous post I described declarative way to find non-exported data and functions using FSM. But often you also need to know offsets to some fields in structures - they can be changed in different versions of Windows. So let see if this can be done in the same declarative manner

Perhaps most safe way is to track registers contained arguments to some function (btw not necessary exported). So I added yet two states to FSM

  • ldrx register_index. Can have prefix stg N to remember this address
  • addx register_index. Can have prefix stg N to remember this address
Amazing but it`s all that we need to start recover offsets!

Lets see example - I wrote simple rules to extract some ETW related structures fields offsets. It starts with exported function EtwRegister contained couple of non-exported functions PsGetCurrentServerSiloGlobals (which you can use for example to extract address of PspHostSiloGlobals - I'll leave this as simple exercise for the reader) and EtwpRegisterProvider - it expects ETW_SILODRIVERSTATE as first parameter, so we can ldrx0 here and get ESERVERSILO_GLOBALS.EtwSiloState offset

Then process EtwpRegisterProvider - it contains calls to EtwpFindGuidEntryByGuid & EtwpAddGuidEntry and ExAcquirePushLockExclusiveEx - in x0 we also can get ETW_GUID_ENTRY.Lock offset

Finally process EtwpFindGuidEntryByGuid to extract ETW_GUID_ENTRY.Guid offset
Run on kernel 20251:

понедельник, 8 февраля 2021 г.

fsm rules syntax

I added saving and loading of FSM rules in file - so now you can edit them (or perhaps even write new manually) and then apply with new tool afsm. So lets see how it works

  1. We must make functions distinguishable. Functions must be either exported or contain loading of some constant - from constant pool or from .rdata section
  2. Then this functions disassembling and FSM rules applied to code-flow graph. There may be several results, so I added global storage - it can be accessed by index from any rules (but sure this storage belongs to each processed file). Storage logic cannot be auto-derived so you should write such rules manually - storing states must have "stg" prefix with index
Each rule starts with "section" keyword - it is section where located address which you want to find (you can use comments starting with '#'). Then you must pick function. If functions is exported it`s easy - "func" export_name, if not - just pick section where this function located with "fsection" section_name
Then follow one or more states of FSM:
  • load - loading from "section". Can have prefix stg N to remember this address
  • store - storing to "section". Can have prefix stg N to remember this address
  • ldrb - like "load" but for 1 byte
  • ldrh - like "load" but for 2 bytes
  • strb - like "store" but for 1 byte
  • strh - like "store" but for 2 byte
  • gload index - load address from storage with index
  • gstore index - store to address from storage with index
  • const - load some constant from constant pool
  • rdata - load some 8 byte constant from .rdata section
  • guid - load 16 byte guid from .rdata section. Actually rdata and guid could be one state with variable size but I am too lazy
  • call_imp - call some imported function from IAT
  • call_dimp - call some function from delayed IAT
  • call_exp - call exported function
  • call - just some call, perhaps located in specific section. Can have prefix stg N to remember this address
  • gcall index - call function with address in storage

Lets see example - say we want to find MCGEN_TRACE_CONTEXTs in kernel - registered with non-exported function McGenEventRegister_EtwRegister, There are 3 functions where this call occurs:
  • FsRtlpHeatRegisterVolume
  • IoInitSystemPreDrivers
  • PnpDiagInitialize
none of them are exported. Try write rules for them

вторник, 26 января 2021 г.

auto-derived FSM for usermode dlls

As expected results of auto-derived FSM for usermode dlls are much worse - for example on rpcrt4.dll can be found only 76 symbols from 228. It's because code in usermode contains much fewer unique constants (like NTSTATUS or allocation tags in kernel). So we need to use some additional data to make edges more distinguishable. Lets consider several candidates

load_config

Contains addresses of SecurityCookie and ptrs to GuardCFCheckFunctionPointer & GuardCFDispatchFunctionPointer. At least knowing SecurityCookie we can distinguish loading of some address in .data section from loading of cookies in prolog/epilogue of functions. But results are almost the same - 78 from 228

delayed import

New source of data missing in kernel mode. So I added new state to FSM - call_dimp, almost the same as call_imp but for delayed IAT. As expected results have grown - 109 from 228

constants in .rdata section

arm64 code can use not only ldr from constant pool but regular const data in .rdata section - for example strings for GetProcAddress etc. Lets see how looks such code:

понедельник, 25 января 2021 г.

W32pServiceTable from windows 10 build 20292 64bit

 It seems that MS cut off whole apfnSimpleCall dispatching - no more functions

  • NtUserCallHwndParamLock
  • NtUserCallHwndParam
  • NtUserCallHwndLockSafe
  • NtUserCallHwndParamLockSafe
  • NtUserCallHwndLock
  • NtUserCallHwnd
  • NtUserCallNoParam
  • NtUserCallTwoParam
  • NtUserCallOneParam
  • NtUserCallHwndSafe
  • NtUserCallHwndOpt
Instead all functions from apfnSimpleCall now exported and contained in W32pServiceTable. Like (just to name few):
  • CreateMenu -> NtUserCreateMenu
  • CreatePopupMenu -> NtUserCreatePopupMenu
  • AllowForegroundActivation -> NtUserAllowForegroundActivation
etc etc
content of W32pServiceTable (W32pServiceLimit .eq. 0x5AA):

четверг, 14 января 2021 г.

using of auto-derived state machines

Let`s see what we can do with our auto-derived state-machines. All source code in my github repo

Simple case: KdLocalDebugEnabled

Assume that we want to find address of KdLocalDebugEnabled. On kernel 18345 RVA is 37CC18 and it located in section .data. Run
ldr.exe -se -t 8 -der D:\work\kernel\w10\18346\arm\ntoskrnl.exe 37CC18
to build rules. Option -t sets number of threads. Results:

found at 0076D850 - KdSystemDebugControl
 ldrb exorted KdDebuggerEnabled
 ldrb
apply return 37CC18, must_be 37CC18

This rule say that we must find exported function KdSystemDebugControl, wait for loading of exported symbol KdDebuggerEnabled and next loading operation will give us address of KdLocalDebugEnabled
Now apply this rule for kernel RTM 2004 (with option -T you can specify files on which to test rules):
ldr.exe -se -t 8 -der D:\work\kernel\w10\18346\arm\ntoskrnl.exe 37CC18 -T d:\work\kernel\w10\rtm\2004\arm\ntoskrnl.exe
 ldrb exorted KdDebuggerEnabled
 ldrb
Test[0]: C3F639

Lets check this address
// pubsym <rva 0xc3f639> KdLocalDebugEnabled

Second case: CmpTraceRoutine

IDA Pro shows 106 xrefs on kernel 18345, RVA is 8A8008. Lets see if rule for finding this address can be derived automatically:

воскресенье, 10 января 2021 г.

efficiency of auto-derived state machines

It`s time to measure how effective this state-machines. I made today simple perl script to measure how much symbols (located in sections .data, ALMOSTRO and PAGEDATA) can be found for arm64 windows kernel. The conditions for success are

  • found function is exported
  • or found function use some unique constant which is used no more than 3 times
Result on kernel build 18346:
total: 3493 symbols, found 1466

Simple state machine with states containing only loading/storing, call import/export and loading of some constant is able to retrieve almost 42% of symbols

PS: for adf.sys (which has no exported functions at all) results even better:
total: 164 symbols, found 73
44.5%

пятница, 8 января 2021 г.

(semi)auto building of state machine

Several days ago I made PoC to extract addresses of WSK data from windows 10 arm64 afd.sys - specifically AfdWskClientListHead and lock AfdWskClientSpinLock. Nothing special except fact that afd.sys has no exported functions. So you must find some rare constant, then find functions which use it and only then do some disasm applying state machine to each code block (see lambda passed to traverse_simple_state_graph)

While I was writing this code, I was not left with a question whether it is possible to employ computer to build such state machines. And now I know that this is possible (at least for code on plain C for RISC-like asm with predictable addresses of instructions etc etc)

Lets see how such algo can be arranged:

1) you must find all cross-refs to desired variable and collect list of functions which use it (exactly what deriv_hack::find_xrefs method does)

2) then you must disasm each such function and try to get some primitives - like loading of constants, calling imported/exported functions etc - see deriv_hack::make_path method. Sure set of this primitives will be different for each processor and perhaps will depends from your tasks

Results for afd.sys!AfdWskClientListHead: