Linux kernel: genesis and early entropy users

September 4th, 2019

To start republican work on the kernel, I'd have to genesis it first, which I do right below:

curl 'http://bvt-trace.net/vpatches/linux-genesis.vpatch' > linux-genesis.vpatch
curl 'http://bvt-trace.net/vpatches/linux-genesis.vpatch.bvt.sig' > linux-genesis.vpatch.bvt.sig

However, someone very bright decided to add binary documentation files to the Linux repository, so now I have to provide them separately:

curl 'http://bvt-trace.net/vpatches/linux-binary-docs.tgz' > linux-binary-docs.tgz
curl 'http://bvt-trace.net/vpatches/linux-binary-docs.tgz.bvt.sig' > linux-binary-docs.tgz.bvt.sig

The genesis Linux version is 4.9.95 - the one used in Cuntoo. The logic is that:

  • Cuntoo is the first published republican project the includes Linux kernel, so it is logical to use its version as a foundation.
  • As mentioned before, I do not expect much changes to the used Linux version, as this is a mostly senseless activity - each version has some stuff broken and some stuff fixed.
  • In case there is a necessity to move to older kernel, this can always be achieved by providing a corresponding vpatch. Perhaps this would require a minor adaptation of the vpatches, given that there is no stable API inside kernel. Still, should not be too hard, unless the version difference becomes to big. One of the concerns here is that kernel version used by at least Rockchip as a foundation of their patchset seems 4.4 - it may be wise to rollback to this version immediately after the genesis.

As the development will start in qemu to simplify the setup1, I will provide the configuration for a minimal kernel that will most likely not work on any real hardware, but should work inside a VM just fine:

# make ARCH=x86 allnoconfig KCONFIG_ALLCONFIG=x86_64.miniconf
# make ARCH=x86 -j $(nproc)
# boot arch/x86/boot/bzImage

CONFIG_64BIT=y

CONFIG_UNWINDER_FRAME_POINTER=y

CONFIG_PCI=y
CONFIG_BLK_DEV_SD=y
CONFIG_ATA=y
CONFIG_ATA_SFF=y
CONFIG_ATA_BMDMA=y
CONFIG_ATA_PIIX=y

CONFIG_NET_VENDOR_INTEL=y
CONFIG_E1000=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_RTC_CLASS=y

# CONFIG_EMBEDDED is not set
CONFIG_EARLY_PRINTK=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_RD_GZIP=y

CONFIG_BLK_DEV_LOOP=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_UTF8=y
CONFIG_MISC_FILESYSTEMS=y
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_ZLIB=y
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y

CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_NETDEVICES=y
CONFIG_NET_CORE=y
#CONFIG_NETCONSOLE=y
CONFIG_ETHERNET=y

CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_KERNEL=y
CONFIG_GDB_SCRIPTS=y

Looks too short? The file is taken from mkroot with minor changes2; the first comment is a trick3 that allows you to specify only the settings you need, and rely on the Kconfig to automatically enable their dependencies.

make allnoconfig KCONFIG_ALLCONFIG=x86_64.miniconf
make -j4

This gives us the bootable kernel in arch/x86/boot/bzImage, and kernel ELF file with debug symbols in ./vmlinux, which we can then use with gdb4:

# console 1
qemu-system-x86_64 -nographic -no-reboot -m 256M -append "panic=1 HOST=x64_64" -kernel arch/x86/boot/bzImage -s -S
# console 2
/usr/local/bin/gdb ./vmlinux -ex 'target remote localhost:1234' -ex 'set arch i386:x86-64'

Then, inside gdb, which survives getting 32bit instead of 64bit registers only via reconnect:


(gdb) rbreak get_random_
Breakpoint 1 at 0xffffffff811cf3b0: file drivers/char/random.c, line 1566.
void get_random_bytes(void *, int);
Breakpoint 2 at 0xffffffff811cf540: file drivers/char/random.c, line 1665.
void get_random_bytes_arch(void *, int);
Breakpoint 3 at 0xffffffff811cf7e0: file ./arch/x86/include/asm/cpufeature.h, line 151.
unsigned int get_random_int(void);
Breakpoint 4 at 0xffffffff811cf770: file ./arch/x86/include/asm/cpufeature.h, line 151.
unsigned long get_random_long(void);
(gdb) continue
Continuing.
Remote 'g' packet reply is too long (expected 308 bytes, got 536 bytes):
8020c20e0088ffffffffffff0000000000000000000000000d00000000000000040
0000000000000a8666481ffffffff68fec30e0088ffff58fec30e0088ffff383723
81ffffffff00000000000000008020c20e0088ffff0000000000000000000000000
0000000000000000000000000000000000000000000000000000000b0f31c81ffff
ffff460200001000000018000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
00000000000000000007f0300000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000801f0000
(gdb) disconnect
Ending remote debugging.
(gdb) target remote 127.0.0.1:1234
Remote debugging using 127.0.0.1:1234

At this point, we can actually get the backtrace of different early users of entropy:

=> 0xffffffff811cf3b0 : push %r13
get_random_bytes (buf=0xffffffff816466a8 , nbytes=4) at drivers/char/random.c:1566
1566 {
#0 get_random_bytes (buf=0xffffffff816466a8 , nbytes=4) at drivers/char/random.c:1566
#1 0xffffffff81674c6d in setup_net (net=, user_ns=) at net/core/net_namespace.c:285
#2 net_ns_init () at net/core/net_namespace.c:794
#3 0xffffffff81657f91 in do_one_initcall (fn=0xffffffff81674c03 ) at init/main.c:780
#4 0xffffffff81658107 in do_initcall_level (level=) at init/main.c:846
#5 do_initcalls () at init/main.c:854
#6 do_basic_setup () at init/main.c:872
#7 kernel_init_freeable () at init/main.c:1018
#8 0xffffffff812b1825 in kernel_init (unused=) at init/main.c:946
#9 0xffffffff812b5512 in ret_from_fork () at arch/x86/entry/entry_64.S:375
#10 0x0000000000000000 in ?? ()
Continuing.
=> 0xffffffff811cf3b0 : push %r13

Breakpoint 1, get_random_bytes (buf=0xffff88000ec0b008, nbytes=4) at drivers/char/random.c:1566
1566 {
#0 get_random_bytes (buf=0xffff88000ec0b008, nbytes=4) at drivers/char/random.c:1566
#1 0xffffffff8118bb61 in bucket_table_alloc (ht=0xffff88000ec0b008, nbuckets=4, gfp=32) at lib/rhashtable.c:141
#2 0xffffffff8118c736 in rhashtable_init (ht=0xffff88000ec98000, params=0xffffffff8143bdc0 ) at lib/rhashtable.c:912
#3 0xffffffff81675606 in netlink_proto_init () at net/netlink/af_netlink.c:2684
#4 0xffffffff81657f91 in do_one_initcall (fn=0xffffffff816755aa ) at init/main.c:780
#5 0xffffffff81658107 in do_initcall_level (level=) at init/main.c:846
#6 do_initcalls () at init/main.c:854
#7 do_basic_setup () at init/main.c:872
#8 kernel_init_freeable () at init/main.c:1018
#9 0xffffffff812b1825 in kernel_init (unused=) at init/main.c:946
#10 0xffffffff812b5512 in ret_from_fork () at arch/x86/entry/entry_64.S:375
#11 0x0000000000000000 in ?? ()
Continuing.
=> 0xffffffff811cf3b0 : push %r13

Breakpoint 1, get_random_bytes (buf=0xffff88000ecd7f58, nbytes=4) at drivers/char/random.c:1566
1566 {
#0 get_random_bytes (buf=0xffff88000ecd7f58, nbytes=4) at drivers/char/random.c:1566
#1 0xffffffff81244e6b in neigh_get_hash_rnd (x=) at net/core/neighbour.c:314
#2 neigh_hash_alloc (shift=) at net/core/neighbour.c:341
#3 0xffffffff81248cf4 in neigh_table_init (index=, tbl=0x4) at net/core/neighbour.c:1536
#4 0xffffffff81676437 in arp_init () at net/ipv4/arp.c:1266
#5 0xffffffff81676a55 in inet_init () at net/ipv4/af_inet.c:1826
#6 0xffffffff81657f91 in do_one_initcall (fn=0xffffffff81676910 ) at init/main.c:780
#7 0xffffffff81658107 in do_initcall_level (level=) at init/main.c:846
#8 do_initcalls () at init/main.c:854
#9 do_basic_setup () at init/main.c:872
#10 kernel_init_freeable () at init/main.c:1018
#11 0xffffffff812b1825 in kernel_init (unused=) at init/main.c:946
#12 0xffffffff812b5512 in ret_from_fork () at arch/x86/entry/entry_64.S:375
#13 0x0000000000000000 in ?? ()
Continuing.
=> 0xffffffff811cf3b0 : push %r13

Breakpoint 1, get_random_bytes (buf=0xffffffff81646a78 , nbytes=4) at drivers/char/random.c:1566
1566 {
#0 get_random_bytes (buf=0xffffffff81646a78 , nbytes=4) at drivers/char/random.c:1566
#1 0xffffffff81675821 in rt_genid_init (net=) at net/ipv4/route.c:2898
#2 0xffffffff8123396f in ops_init (ops=0xffffffff816d3720 , net=) at net/core/net_namespace.c:111
#3 0xffffffff81233a67 in __register_pernet_operations (ops=, list=) at net/core/net_namespace.c:865
#4 register_pernet_operations (list=0xffffffff816d2e80 , ops=0xffffffff816d3720 ) at net/core/net_namespace.c:901
#5 0xffffffff8123401f in register_pernet_subsys (ops=0xffffffff816d3720 ) at net/core/net_namespace.c:943
#6 0xffffffff81675a27 in ip_rt_init () at net/ipv4/route.c:2992
#7 0xffffffff81675c87 in ip_init () at net/ipv4/ip_output.c:1639
#8 0xffffffff81676a5a in inet_init () at net/ipv4/af_inet.c:1832
#9 0xffffffff81657f91 in do_one_initcall (fn=0xffffffff81676910 ) at init/main.c:780
#10 0xffffffff81658107 in do_initcall_level (level=) at init/main.c:846
#11 do_initcalls () at init/main.c:854
#12 do_basic_setup () at init/main.c:872
#13 kernel_init_freeable () at init/main.c:1018
#14 0xffffffff812b1825 in kernel_init (unused=) at init/main.c:946
#15 0xffffffff812b5512 in ret_from_fork () at arch/x86/entry/entry_64.S:375
#16 0x0000000000000000 in ?? ()
Continuing.
=> 0xffffffff811cf3b0 : push %r13

Breakpoint 1, get_random_bytes (buf=0xffff88000ed0a808, nbytes=4) at drivers/char/random.c:1566
1566 {
#0 get_random_bytes (buf=0xffff88000ed0a808, nbytes=4) at drivers/char/random.c:1566
#1 0xffffffff8118bb61 in bucket_table_alloc (ht=0xffff88000ed0a808, nbuckets=4, gfp=32) at lib/rhashtable.c:141
#2 0xffffffff8118c736 in rhashtable_init (ht=0xffffffff81646910 , params=0xffffffff81729730 ) at lib/rhashtable.c:912
#3 0xffffffff81675b7f in inet_frags_init_net (nf=) at ./include/net/inet_frag.h:110
#4 ipv4_frags_init_net (net=0xffffffff816466a0 ) at net/ipv4/ip_fragment.c:685
#5 0xffffffff8123396f in ops_init (ops=0xffffffff81648ae0 , net=) at net/core/net_namespace.c:111
#6 0xffffffff81233a67 in __register_pernet_operations (ops=, list=) at net/core/net_namespace.c:865
#7 register_pernet_operations (list=0xffffffff816d2e80 , ops=0xffffffff81648ae0 ) at net/core/net_namespace.c:901
#8 0xffffffff8123401f in register_pernet_subsys (ops=0xffffffff81648ae0 ) at net/core/net_namespace.c:943
#9 0xffffffff81675c82 in ipfrag_init () at net/ipv4/ip_fragment.c:749
#10 0xffffffff81676b00 in inet_init () at net/ipv4/af_inet.c:1873
#11 0xffffffff81657f91 in do_one_initcall (fn=0xffffffff81676910 ) at init/main.c:780
#12 0xffffffff81658107 in do_initcall_level (level=) at init/main.c:846
#13 do_initcalls () at init/main.c:854
#14 do_basic_setup () at init/main.c:872
#15 kernel_init_freeable () at init/main.c:1018
#16 0xffffffff812b1825 in kernel_init (unused=) at init/main.c:946
#17 0xffffffff812b5512 in ret_from_fork () at arch/x86/entry/entry_64.S:375
#18 0x0000000000000000 in ?? ()
Continuing.
=> 0xffffffff811cf3b0 : push %r13

Breakpoint 1, get_random_bytes (buf=0xffffffff816f1b68 , nbytes=8) at drivers/char/random.c:1566
1566 {
#0 get_random_bytes (buf=0xffffffff816f1b68 , nbytes=8) at drivers/char/random.c:1566
#1 0xffffffff8102f15b in init_oops_id () at kernel/panic.c:490
#2 0xffffffff81657f91 in do_one_initcall (fn=0xffffffff8102f130 ) at init/main.c:780
#3 0xffffffff81658107 in do_initcall_level (level=) at init/main.c:846
#4 do_initcalls () at init/main.c:854
#5 do_basic_setup () at init/main.c:872
#6 kernel_init_freeable () at init/main.c:1018
#7 0xffffffff812b1825 in kernel_init (unused=) at init/main.c:946
#8 0xffffffff812b5512 in ret_from_fork () at arch/x86/entry/entry_64.S:375
#9 0x0000000000000000 in ?? ()
Continuing.
=> 0xffffffff811cf3b0 : push %r13

Breakpoint 1, get_random_bytes (buf=0xffff88000ec3fe90, nbytes=16) at drivers/char/random.c:1566
1566 {
#0 get_random_bytes (buf=0xffff88000ec3fe90, nbytes=16) at drivers/char/random.c:1566
#1 0xffffffff81183b85 in prandom_seed_full_state (pcpu_state=0xffffffff81636c90 ) at lib/random32.c:248
#2 0xffffffff81183c00 in __prandom_reseed (late=) at lib/random32.c:286
#3 0xffffffff8166ec53 in prandom_reseed () at lib/random32.c:298
#4 0xffffffff81657f91 in do_one_initcall (fn=0xffffffff8166ec4c ) at init/main.c:780
#5 0xffffffff81658107 in do_initcall_level (level=) at init/main.c:846
#6 do_initcalls () at init/main.c:854
#7 do_basic_setup () at init/main.c:872
#8 kernel_init_freeable () at init/main.c:1018
#9 0xffffffff812b1825 in kernel_init (unused=) at init/main.c:946
#10 0xffffffff812b5512 in ret_from_fork () at arch/x86/entry/entry_64.S:375
#11 0x0000000000000000 in ?? ()
Continuing.
Remote connection closed
Quit

Those seem to be all the early users of random data in the kernel boot:

  • Network namespaces;
  • Implementation of resizable hash table (used, in turn, by IPv4 fragmentation and Netlink subsystems);
  • ARP neighbour table;
  • Initialization of per-network-interface identifiers used by routing table;
  • Provider of IDs for kernel panics.
  • Reseeding the simple no-pretence PRNG.

To the best of my knowledge, the random data is used as the part of hash function of the hash table, to avoid DDOS attacks on system by crafting collisions in the table.

Additionally, I did some more tests using ftrace on the toiletbox Debian machine. For this, I added the following to the kernel command line:

ftrace=function_graph ftrace_filter=map_vdso_randomized,arch_randomize_brk,stack_maxrandom_size,cache_random_seq_create,
cache_random_seq_destroy,queue_show_random,queue_store_random,kaslr_get_random_long,random_poll,add_device_randomness,
del_random_ready_callback,random_fasync,add_random_ready_callback,random_write,add_timer_randomness,add_input_randomness,
add_disk_randomness,_random_read.part.37,random_read,add_interrupt_randomness,random_ioctl,add_hwgenerator_randomness,
get_random_bytes,get_random_bytes_arch,get_random_long,get_random_int,urandom_read,SyS_getrandom,randomize_page,
tpm_get_random,tpm2_get_random,cdrom_get_random_writable,device_is_not_random

Where the list of functions was produced by

# cat /sys/kernel/debug/tracing/available_filter_functions | grep random | tr '\n' , ; echo

and some minor filtering. This enables ftrace at some early point in the boot process. After reboot, I did:

# cat /sys/kernel/debug/tracing/trace > /tmp/trace.txt
# grep random /tmp/trace.txt | grep -v systemd-random  | cut -c 48- | sort | uniq -c
    143  add_device_randomness <-posix_cpu_timers_exit
      2  add_device_randomness <-register_netdevice
      9  add_device_randomness <-usb_new_device
    144  add_disk_randomness <-scsi_end_request
    868  add_interrupt_randomness <-handle_irq_event_percpu
      1  add_timer <-prandom_reseed
     73  add_timer_randomness <-add_disk_randomness
     91  align_vdso_addr <-map_vdso_randomized
     91  arch_randomize_brk <-load_elf_binary
    188  cache_random_seq_create <-enable_cpucache
   4573  __check_object_size <-urandom_read
      9  credit_entropy_bits <-add_interrupt_randomness
     73  credit_entropy_bits <-add_timer_randomness
    149  _crng_backtrack_protect <-get_random_bytes
    149  crng_backtrack_protect <-get_random_bytes
   4573  _crng_backtrack_protect <-urandom_read
   4573  crng_backtrack_protect <-urandom_read
      4  crng_fast_load <-add_interrupt_randomness
    150  _extract_crng <-get_random_bytes
    150  extract_crng <-get_random_bytes
   4573  _extract_crng <-urandom_read
   4573  extract_crng <-urandom_read
     38  get_random_bytes <-bucket_table_alloc
      1  get_random_bytes <-generate_random_uuid
      1  get_random_bytes <-init_oops_id
      1  get_random_bytes <-ipv6_regen_rndid
      1  get_random_bytes <-kcmp_cookies_init
      3  get_random_bytes <-key_alloc
     91  get_random_bytes <-load_elf_binary
      8  get_random_bytes <-neigh_hash_alloc
      4  get_random_bytes <-prandom_seed_full_state
      1  get_random_bytes <-rt_genid_init
    183  get_random_int <-arch_align_stack
      8  get_random_int <-bpf_jit_binary_alloc
   5196  get_random_int <-cache_grow_begin
     91  get_random_int <-map_vdso_randomized
    174  get_random_long <-arch_mmap_rnd
    184  get_random_long <-cache_random_seq_create
    221  get_random_long <-copy_process.part.34
     92  get_random_long <-load_elf_binary
     91  get_random_long <-randomize_page
    184  __kmalloc <-cache_random_seq_create
     91  map_vdso_randomized <-load_elf_binary
    308  _mix_pool_bytes <-add_device_randomness
      9  __mix_pool_bytes <-add_interrupt_randomness
      9  _mix_pool_bytes <-add_interrupt_randomness
     73  mix_pool_bytes <-add_timer_randomness
     10  printk <-urandom_read
     91  randomize_page <-load_elf_binary
    154  _raw_spin_lock_irqsave <-add_device_randomness
     10  _raw_spin_lock_irqsave <-urandom_read
      9  _raw_spin_trylock <-add_interrupt_randomness
      1  _raw_spin_trylock <-__prandom_reseed
    154  _raw_spin_unlock_irqrestore <-add_device_randomness
     73  _raw_spin_unlock_irqrestore <-add_timer_randomness
    299  _raw_spin_unlock_irqrestore <-get_random_bytes
      1  _raw_spin_unlock_irqrestore <-prandom_reseed
   9156  _raw_spin_unlock_irqrestore <-urandom_read
    184  stack_maxrandom_size <-arch_pick_mmap_layout
   4573  SyS_getrandom <-do_syscall_64
   4573  urandom_read <-vfs_read

This list also shows that process creation and kernel object freelist implementation are also consuming entropy5.

I expect that the next posts on this topic will already contain some code.

  1. No need for serial cables, setting up PXE or copying kernel to USB sticks. []
  2. I have removed IPv6, and added debug symbols. Also, don't miss the active comment setting CONFIG_EMBEDDED - it cannot be skipped because Kconfig looks for it. []
  3. Yes, trick - because it's not documented anywhere in spite of its usefulness. []
  4. To be clear, this gdb-qemu integration barely works when the CPU changes system bitness during boot: I actually had get the latest gdb to avoid crashes, and even then it required a reconnect. []
  5. Exotica like BPF JIT, IPv6 (funnily enough, that kernel does have IPv6 disabled via kernel command line) are hardly worth the mention. []

7 Responses to “Linux kernel: genesis and early entropy users”

  1. PeterL says:
    1

    "However, someone very bright decided to add binary documentation files to the Linux repository, so now I have to provide them separately:"

    Maybe I am missing something here, but why would the documentation, which is specifically meant to be read by humans, be included as binary instead of text?

  2. bvt says:
    2

    The binary documentation files are non-SVG images and PDFs - fits description of "binary documentation" perfectly.

  3. > In case there is a necessity to move to older kernel, this can always be achieved by providing a corresponding vpatch.

    I found myself stuck on this. My "M" kernel is written for 3.x, whereas RK requires 4.x; the latter includes the "device tree" gnarl (and much else, just about *all* iron-related code was rewritten) and consequently the diff distance b/w a 3.x and 4.x is gargantuan.

  4. bvt says:
    4

    Indeed, in the future supporting multiple versions of kernel may become a necessity due to architecture requirements; however I'd like to keep the 'zoo' of actively used versions as small as possible.

    Re. 3.x-4.x diff - I expect that it is huge, but I'd also bet that it's rewritten is a sense of function names/APIs changed, not that the code that does actual work (I wonder what percentage of kernel LoC falls into this category?) is different.

    Which leads to the following question: is version 4.9 totally unusable for MIPS due to some fundamental breakage? Or is it just the version you (or IIRC original image provider?) picked, and moving that kernel to version 4.9 is still possible?

  5. > not that the code that does actual work

    Nope, much of the functionality in device drivers has been moved to "device tree files", and out of the code.

    > is version 4.9 totally unusable for MIPS

    Theoretically usable, but would have to rewrite entirely. And was reluctant to pick the considerably heavier kernel.

  6. bvt says:
    6

    I see. Device tree file typically contains the configuration of SoC (voltages, peripherals configuration, where exactly to find ttys, etc), so I assume the platform configuration moved from some header to dbt file? The problem with device files is that for ARM64, supporting many devices without dtbs would be ~impossible, but if in-source support was there before - it's certainly a huge ugh.

    Re kernel weight, the minimal image for x86_64 qemu produced by the procedure in this post is ~2.2Mb. The minimal kernel for MIPS should be around the same.

  7. [...] Linux 4.9.95 was genesis'd and feeding the RNG with FG is [...]

Leave a Reply