понедельник, 27 мая 2024 г.

kotest

Linux kernel allows you to have discardable sections in LKM and this creates problem of links between two kind of memory. As you can guess keeping pointer to already unloaded area can be very dangerous so I made simple tool kotest to check such kind of links. It divides sections of ELF file into two category and check all relocations - relocs between areas of the same type considered as ok. To keep track if some symbol from persistent area is used only from discardable sections I also use couple of reference counts

 

command line options

  • -b take into account variables in .bss
  • -h make hexdump of found vars
  • -v verbose mode 
To run on lots of LKMs use something like
find path_to_kernel_root -type f -name "*.ko" | xargs kotest
 
To get summary you can run awk -f total.awk on output of previous command

 

it is reliable to use for analysis only fixups?

No - there are false positives. Consider excerpt from ip_vs.ko, function ip_vs_register_nl_ioctl:
.init.text:0000000000016155   mov     rdi, offset ip_vs_genl_family
.init.text:000000000001615C   mov     cs:ip_vs_genl_family.module, offset __this_module
.init.text:0000000000016167   mov     cs:ip_vs_genl_family.ops, offset ip_vs_genl_ops
.init.text:0000000000016172   mov     cs:ip_vs_genl_family.mcgrps, 0
.init.text:000000000001617D   mov     qword ptr cs:ip_vs_genl_family.n_ops, 10h
.init.text:0000000000016188   call    __genl_register_family

it turns out that ip_vs_genl_ops (located inside .rodata section) referred only from function ip_vs_register_nl_ioctl in .init.text, but actually it cannot be moved to discardable area bcs it was registered with genl_register_family. Kotest cannot analyze usage of addresses and so gives FP:
.rodata + 5A0 (ip_vs_genl_ops) rref 1 xref 0 add size 768
 
Another issue is string merging by ld. Lets assume that we have couple of strings: "foobar" referred from some function(s) in .text section and "bar" referred from code in .init.text. Linker can (and usually do) put only string "foobar" into .rodata and fixup to string "bar" will point to middle of this single string "foobar"
 
So consider output of kotest as estimated upper bound of memory which can be potentially saved by moving into discardable area

 

why not use famous objtool?

Because of NIH syndrome objtool employs disassembler and as consequence it is slow and supports only few architectures. Kotest is based on elfio and can process both 32 & 64 bit ELF files from any arch (and it is very fast)


LKM loading

starts in function load_module. It's surprisingly huge amount of buggy code so I briefly describe only most important
There is nasty bug - in sysfs showed all sections (including freed). So sometimes my lkcd shows amazing results like:
Mod[60] 0xffffffffc0454300 base 0xffffffffc0451000 serio_raw
 init: 0xffffffffc037e000 - nls_iso8859_1!uni2char
 exit: 0xffffffffc0451b8a - serio_raw!serio_raw_drv_exit

field init now points somewhere in middle of module nls_iso8859_1. This happened bcs .init section of serio_raw was freed and now occupied by some other module. Despite this, according to the kernel, it is still listed as part of serio_raw:
ls -1a /sys/module/serio_raw/sections | grep init
.init.text 
This bug was caused in function mod_sysfs_setup which knows nothing about discardable sections (and perhaps should call within_module_init to filter out some sections and also save some memory from several module_sect_attr items)

 

What sections considered by kernel as discardable?

Simple answer - if their names start with ".init". More detailed answer - each architecture can have own version of function module_init_section
For example see arm specific sections
 
The problem is that this list is not exhausted - some section can be moved to discardable area bcs their content is not used after module initialization. Just to name few:

However this not all. Let's check function do_mod_ctors. Field module->ctors can point to section ".init.array" (which is considered as discardable) or to ".ctors" (which is not). Logic? Haven't heard

 

Which data from discardable sections kernel able to clean up?

As you can see function do_init_module calls ftrace_free_mem and trim_init_extable
The last one has very weird comment (glare example of the fact that sometimes no comment is much better):
If the exception table is sorted, any referring to the module init
  will be at the beginning or the end.
The problem is that exception table (stored in module->extable) is always sorted in post_relocation
So as you can assume content of section "__ex_table" is cleaning up before freeing of init sections
 
As for first - ftraces initially loads from section with name FTRACE_CALLSITE_SECTION. And I always believed that presence of ftraces for init functions is idea of questionable usefulness. Sure you can manually mark each init function with __attribute__((__no_instrument_function__)). If you are as lazy as me - entrust this task to the gcc with my patch

 

Results

On 6.8.8 for aarch64 we have 37140 bytes for data referred only from discardable sections (remember about FP) and 245988 bytes for sections which can be moved to discardable area

Комментариев нет:

Отправить комментарий