To start republican work on the kernel, I'd have to genesis it first, which I do right below:
curl 'http://bvt-trace.net/vpatches/linux-genesis.vpatch' > linux-genesis.vpatch
curl 'http://bvt-trace.net/vpatches/linux-genesis.vpatch.bvt.sig' > linux-genesis.vpatch.bvt.sig
However, someone very bright decided to add binary documentation files to the Linux repository, so now I have to provide them separately:
curl 'http://bvt-trace.net/vpatches/linux-binary-docs.tgz' > linux-binary-docs.tgz
curl 'http://bvt-trace.net/vpatches/linux-binary-docs.tgz.bvt.sig' > linux-binary-docs.tgz.bvt.sig
The genesis Linux version is 4.9.95 - the one used in Cuntoo. The logic is that:
- Cuntoo is the first published republican project the includes Linux kernel, so it is logical to use its version as a foundation.
- As mentioned before, I do not expect much changes to the used Linux version, as this is a mostly senseless activity - each version has some stuff broken and some stuff fixed.
- In case there is a necessity to move to older kernel, this can always be achieved by providing a corresponding vpatch. Perhaps this would require a minor adaptation of the vpatches, given that there is no stable API inside kernel. Still, should not be too hard, unless the version difference becomes to big. One of the concerns here is that kernel version used by at least Rockchip as a foundation of their patchset seems 4.4 - it may be wise to rollback to this version immediately after the genesis.
As the development will start in qemu to simplify the setup1, I will provide the configuration for a minimal kernel that will most likely not work on any real hardware, but should work inside a VM just fine:
# make ARCH=x86 allnoconfig KCONFIG_ALLCONFIG=x86_64.miniconf
# make ARCH=x86 -j $(nproc)
# boot arch/x86/boot/bzImage
CONFIG_64BIT=y
CONFIG_UNWINDER_FRAME_POINTER=y
CONFIG_PCI=y
CONFIG_BLK_DEV_SD=y
CONFIG_ATA=y
CONFIG_ATA_SFF=y
CONFIG_ATA_BMDMA=y
CONFIG_ATA_PIIX=y
CONFIG_NET_VENDOR_INTEL=y
CONFIG_E1000=y
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_RTC_CLASS=y
# CONFIG_EMBEDDED is not set
CONFIG_EARLY_PRINTK=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_SCRIPT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_INITRD=y
CONFIG_RD_GZIP=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_EXT4_FS=y
CONFIG_EXT4_USE_FOR_EXT2=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_UTF8=y
CONFIG_MISC_FILESYSTEMS=y
CONFIG_SQUASHFS=y
CONFIG_SQUASHFS_XATTR=y
CONFIG_SQUASHFS_ZLIB=y
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_NETDEVICES=y
CONFIG_NET_CORE=y
#CONFIG_NETCONSOLE=y
CONFIG_ETHERNET=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_KERNEL=y
CONFIG_GDB_SCRIPTS=y
Looks too short? The file is taken from mkroot with minor changes2; the first comment is a trick3 that allows you to specify only the settings you need, and rely on the Kconfig to automatically enable their dependencies.
make allnoconfig KCONFIG_ALLCONFIG=x86_64.miniconf
make -j4
This gives us the bootable kernel in arch/x86/boot/bzImage, and kernel ELF file with debug symbols in ./vmlinux, which we can then use with gdb4:
# console 1
qemu-system-x86_64 -nographic -no-reboot -m 256M -append "panic=1 HOST=x64_64" -kernel arch/x86/boot/bzImage -s -S
# console 2
/usr/local/bin/gdb ./vmlinux -ex 'target remote localhost:1234' -ex 'set arch i386:x86-64'
Then, inside gdb, which survives getting 32bit instead of 64bit registers only via reconnect:
(gdb) rbreak get_random_
Breakpoint 1 at 0xffffffff811cf3b0: file drivers/char/random.c, line 1566.
void get_random_bytes(void *, int);
Breakpoint 2 at 0xffffffff811cf540: file drivers/char/random.c, line 1665.
void get_random_bytes_arch(void *, int);
Breakpoint 3 at 0xffffffff811cf7e0: file ./arch/x86/include/asm/cpufeature.h, line 151.
unsigned int get_random_int(void);
Breakpoint 4 at 0xffffffff811cf770: file ./arch/x86/include/asm/cpufeature.h, line 151.
unsigned long get_random_long(void);
(gdb) continue
Continuing.
Remote 'g' packet reply is too long (expected 308 bytes, got 536 bytes):
8020c20e0088ffffffffffff0000000000000000000000000d00000000000000040
0000000000000a8666481ffffffff68fec30e0088ffff58fec30e0088ffff383723
81ffffffff00000000000000008020c20e0088ffff0000000000000000000000000
0000000000000000000000000000000000000000000000000000000b0f31c81ffff
ffff460200001000000018000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
00000000000000000007f0300000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000801f0000
(gdb) disconnect
Ending remote debugging.
(gdb) target remote 127.0.0.1:1234
Remote debugging using 127.0.0.1:1234
At this point, we can actually get the backtrace of different early users of entropy:
=> 0xffffffff811cf3b0
get_random_bytes (buf=0xffffffff816466a8
1566 {
#0 get_random_bytes (buf=0xffffffff816466a8
#1 0xffffffff81674c6d in setup_net (net=
#2 net_ns_init () at net/core/net_namespace.c:794
#3 0xffffffff81657f91 in do_one_initcall (fn=0xffffffff81674c03
#4 0xffffffff81658107 in do_initcall_level (level=
#5 do_initcalls () at init/main.c:854
#6 do_basic_setup () at init/main.c:872
#7 kernel_init_freeable () at init/main.c:1018
#8 0xffffffff812b1825 in kernel_init (unused=
#9 0xffffffff812b5512 in ret_from_fork () at arch/x86/entry/entry_64.S:375
#10 0x0000000000000000 in ?? ()
Continuing.
=> 0xffffffff811cf3b0
Breakpoint 1, get_random_bytes (buf=0xffff88000ec0b008, nbytes=4) at drivers/char/random.c:1566
1566 {
#0 get_random_bytes (buf=0xffff88000ec0b008, nbytes=4) at drivers/char/random.c:1566
#1 0xffffffff8118bb61 in bucket_table_alloc (ht=0xffff88000ec0b008, nbuckets=4, gfp=32) at lib/rhashtable.c:141
#2 0xffffffff8118c736 in rhashtable_init (ht=0xffff88000ec98000, params=0xffffffff8143bdc0
#3 0xffffffff81675606 in netlink_proto_init () at net/netlink/af_netlink.c:2684
#4 0xffffffff81657f91 in do_one_initcall (fn=0xffffffff816755aa
#5 0xffffffff81658107 in do_initcall_level (level=
#6 do_initcalls () at init/main.c:854
#7 do_basic_setup () at init/main.c:872
#8 kernel_init_freeable () at init/main.c:1018
#9 0xffffffff812b1825 in kernel_init (unused=
#10 0xffffffff812b5512 in ret_from_fork () at arch/x86/entry/entry_64.S:375
#11 0x0000000000000000 in ?? ()
Continuing.
=> 0xffffffff811cf3b0
Breakpoint 1, get_random_bytes (buf=0xffff88000ecd7f58, nbytes=4) at drivers/char/random.c:1566
1566 {
#0 get_random_bytes (buf=0xffff88000ecd7f58, nbytes=4) at drivers/char/random.c:1566
#1 0xffffffff81244e6b in neigh_get_hash_rnd (x=
#2 neigh_hash_alloc (shift=
#3 0xffffffff81248cf4 in neigh_table_init (index=
#4 0xffffffff81676437 in arp_init () at net/ipv4/arp.c:1266
#5 0xffffffff81676a55 in inet_init () at net/ipv4/af_inet.c:1826
#6 0xffffffff81657f91 in do_one_initcall (fn=0xffffffff81676910
#7 0xffffffff81658107 in do_initcall_level (level=
#8 do_initcalls () at init/main.c:854
#9 do_basic_setup () at init/main.c:872
#10 kernel_init_freeable () at init/main.c:1018
#11 0xffffffff812b1825 in kernel_init (unused=
#12 0xffffffff812b5512 in ret_from_fork () at arch/x86/entry/entry_64.S:375
#13 0x0000000000000000 in ?? ()
Continuing.
=> 0xffffffff811cf3b0
Breakpoint 1, get_random_bytes (buf=0xffffffff81646a78
1566 {
#0 get_random_bytes (buf=0xffffffff81646a78
#1 0xffffffff81675821 in rt_genid_init (net=
#2 0xffffffff8123396f in ops_init (ops=0xffffffff816d3720
#3 0xffffffff81233a67 in __register_pernet_operations (ops=
#4 register_pernet_operations (list=0xffffffff816d2e80
#5 0xffffffff8123401f in register_pernet_subsys (ops=0xffffffff816d3720
#6 0xffffffff81675a27 in ip_rt_init () at net/ipv4/route.c:2992
#7 0xffffffff81675c87 in ip_init () at net/ipv4/ip_output.c:1639
#8 0xffffffff81676a5a in inet_init () at net/ipv4/af_inet.c:1832
#9 0xffffffff81657f91 in do_one_initcall (fn=0xffffffff81676910
#10 0xffffffff81658107 in do_initcall_level (level=
#11 do_initcalls () at init/main.c:854
#12 do_basic_setup () at init/main.c:872
#13 kernel_init_freeable () at init/main.c:1018
#14 0xffffffff812b1825 in kernel_init (unused=
#15 0xffffffff812b5512 in ret_from_fork () at arch/x86/entry/entry_64.S:375
#16 0x0000000000000000 in ?? ()
Continuing.
=> 0xffffffff811cf3b0
Breakpoint 1, get_random_bytes (buf=0xffff88000ed0a808, nbytes=4) at drivers/char/random.c:1566
1566 {
#0 get_random_bytes (buf=0xffff88000ed0a808, nbytes=4) at drivers/char/random.c:1566
#1 0xffffffff8118bb61 in bucket_table_alloc (ht=0xffff88000ed0a808, nbuckets=4, gfp=32) at lib/rhashtable.c:141
#2 0xffffffff8118c736 in rhashtable_init (ht=0xffffffff81646910
#3 0xffffffff81675b7f in inet_frags_init_net (nf=
#4 ipv4_frags_init_net (net=0xffffffff816466a0
#5 0xffffffff8123396f in ops_init (ops=0xffffffff81648ae0
#6 0xffffffff81233a67 in __register_pernet_operations (ops=
#7 register_pernet_operations (list=0xffffffff816d2e80
#8 0xffffffff8123401f in register_pernet_subsys (ops=0xffffffff81648ae0
#9 0xffffffff81675c82 in ipfrag_init () at net/ipv4/ip_fragment.c:749
#10 0xffffffff81676b00 in inet_init () at net/ipv4/af_inet.c:1873
#11 0xffffffff81657f91 in do_one_initcall (fn=0xffffffff81676910
#12 0xffffffff81658107 in do_initcall_level (level=
#13 do_initcalls () at init/main.c:854
#14 do_basic_setup () at init/main.c:872
#15 kernel_init_freeable () at init/main.c:1018
#16 0xffffffff812b1825 in kernel_init (unused=
#17 0xffffffff812b5512 in ret_from_fork () at arch/x86/entry/entry_64.S:375
#18 0x0000000000000000 in ?? ()
Continuing.
=> 0xffffffff811cf3b0
Breakpoint 1, get_random_bytes (buf=0xffffffff816f1b68
1566 {
#0 get_random_bytes (buf=0xffffffff816f1b68
#1 0xffffffff8102f15b in init_oops_id () at kernel/panic.c:490
#2 0xffffffff81657f91 in do_one_initcall (fn=0xffffffff8102f130
#3 0xffffffff81658107 in do_initcall_level (level=
#4 do_initcalls () at init/main.c:854
#5 do_basic_setup () at init/main.c:872
#6 kernel_init_freeable () at init/main.c:1018
#7 0xffffffff812b1825 in kernel_init (unused=
#8 0xffffffff812b5512 in ret_from_fork () at arch/x86/entry/entry_64.S:375
#9 0x0000000000000000 in ?? ()
Continuing.
=> 0xffffffff811cf3b0
Breakpoint 1, get_random_bytes (buf=0xffff88000ec3fe90, nbytes=16) at drivers/char/random.c:1566
1566 {
#0 get_random_bytes (buf=0xffff88000ec3fe90, nbytes=16) at drivers/char/random.c:1566
#1 0xffffffff81183b85 in prandom_seed_full_state (pcpu_state=0xffffffff81636c90
#2 0xffffffff81183c00 in __prandom_reseed (late=
#3 0xffffffff8166ec53 in prandom_reseed () at lib/random32.c:298
#4 0xffffffff81657f91 in do_one_initcall (fn=0xffffffff8166ec4c
#5 0xffffffff81658107 in do_initcall_level (level=
#6 do_initcalls () at init/main.c:854
#7 do_basic_setup () at init/main.c:872
#8 kernel_init_freeable () at init/main.c:1018
#9 0xffffffff812b1825 in kernel_init (unused=
#10 0xffffffff812b5512 in ret_from_fork () at arch/x86/entry/entry_64.S:375
#11 0x0000000000000000 in ?? ()
Continuing.
Remote connection closed
Quit
Those seem to be all the early users of random data in the kernel boot:
- Network namespaces;
- Implementation of resizable hash table (used, in turn, by IPv4 fragmentation and Netlink subsystems);
- ARP neighbour table;
- Initialization of per-network-interface identifiers used by routing table;
- Provider of IDs for kernel panics.
- Reseeding the simple no-pretence PRNG.
To the best of my knowledge, the random data is used as the part of hash function of the hash table, to avoid DDOS attacks on system by crafting collisions in the table.
Additionally, I did some more tests using ftrace on the toiletbox Debian machine. For this, I added the following to the kernel command line:
ftrace=function_graph ftrace_filter=map_vdso_randomized,arch_randomize_brk,stack_maxrandom_size,cache_random_seq_create,
cache_random_seq_destroy,queue_show_random,queue_store_random,kaslr_get_random_long,random_poll,add_device_randomness,
del_random_ready_callback,random_fasync,add_random_ready_callback,random_write,add_timer_randomness,add_input_randomness,
add_disk_randomness,_random_read.part.37,random_read,add_interrupt_randomness,random_ioctl,add_hwgenerator_randomness,
get_random_bytes,get_random_bytes_arch,get_random_long,get_random_int,urandom_read,SyS_getrandom,randomize_page,
tpm_get_random,tpm2_get_random,cdrom_get_random_writable,device_is_not_random
Where the list of functions was produced by
# cat /sys/kernel/debug/tracing/available_filter_functions | grep random | tr '\n' , ; echo
and some minor filtering. This enables ftrace at some early point in the boot process. After reboot, I did:
# cat /sys/kernel/debug/tracing/trace > /tmp/trace.txt # grep random /tmp/trace.txt | grep -v systemd-random | cut -c 48- | sort | uniq -c 143 add_device_randomness <-posix_cpu_timers_exit 2 add_device_randomness <-register_netdevice 9 add_device_randomness <-usb_new_device 144 add_disk_randomness <-scsi_end_request 868 add_interrupt_randomness <-handle_irq_event_percpu 1 add_timer <-prandom_reseed 73 add_timer_randomness <-add_disk_randomness 91 align_vdso_addr <-map_vdso_randomized 91 arch_randomize_brk <-load_elf_binary 188 cache_random_seq_create <-enable_cpucache 4573 __check_object_size <-urandom_read 9 credit_entropy_bits <-add_interrupt_randomness 73 credit_entropy_bits <-add_timer_randomness 149 _crng_backtrack_protect <-get_random_bytes 149 crng_backtrack_protect <-get_random_bytes 4573 _crng_backtrack_protect <-urandom_read 4573 crng_backtrack_protect <-urandom_read 4 crng_fast_load <-add_interrupt_randomness 150 _extract_crng <-get_random_bytes 150 extract_crng <-get_random_bytes 4573 _extract_crng <-urandom_read 4573 extract_crng <-urandom_read 38 get_random_bytes <-bucket_table_alloc 1 get_random_bytes <-generate_random_uuid 1 get_random_bytes <-init_oops_id 1 get_random_bytes <-ipv6_regen_rndid 1 get_random_bytes <-kcmp_cookies_init 3 get_random_bytes <-key_alloc 91 get_random_bytes <-load_elf_binary 8 get_random_bytes <-neigh_hash_alloc 4 get_random_bytes <-prandom_seed_full_state 1 get_random_bytes <-rt_genid_init 183 get_random_int <-arch_align_stack 8 get_random_int <-bpf_jit_binary_alloc 5196 get_random_int <-cache_grow_begin 91 get_random_int <-map_vdso_randomized 174 get_random_long <-arch_mmap_rnd 184 get_random_long <-cache_random_seq_create 221 get_random_long <-copy_process.part.34 92 get_random_long <-load_elf_binary 91 get_random_long <-randomize_page 184 __kmalloc <-cache_random_seq_create 91 map_vdso_randomized <-load_elf_binary 308 _mix_pool_bytes <-add_device_randomness 9 __mix_pool_bytes <-add_interrupt_randomness 9 _mix_pool_bytes <-add_interrupt_randomness 73 mix_pool_bytes <-add_timer_randomness 10 printk <-urandom_read 91 randomize_page <-load_elf_binary 154 _raw_spin_lock_irqsave <-add_device_randomness 10 _raw_spin_lock_irqsave <-urandom_read 9 _raw_spin_trylock <-add_interrupt_randomness 1 _raw_spin_trylock <-__prandom_reseed 154 _raw_spin_unlock_irqrestore <-add_device_randomness 73 _raw_spin_unlock_irqrestore <-add_timer_randomness 299 _raw_spin_unlock_irqrestore <-get_random_bytes 1 _raw_spin_unlock_irqrestore <-prandom_reseed 9156 _raw_spin_unlock_irqrestore <-urandom_read 184 stack_maxrandom_size <-arch_pick_mmap_layout 4573 SyS_getrandom <-do_syscall_64 4573 urandom_read <-vfs_read
This list also shows that process creation and kernel object freelist implementation are also consuming entropy5.
I expect that the next posts on this topic will already contain some code.
- No need for serial cables, setting up PXE or copying kernel to USB sticks. [↩]
- I have removed IPv6, and added debug symbols. Also, don't miss the active comment setting CONFIG_EMBEDDED - it cannot be skipped because Kconfig looks for it. [↩]
- Yes, trick - because it's not documented anywhere in spite of its usefulness. [↩]
- To be clear, this gdb-qemu integration barely works when the CPU changes system bitness during boot: I actually had get the latest gdb to avoid crashes, and even then it required a reconnect. [↩]
- Exotica like BPF JIT, IPv6 (funnily enough, that kernel does have IPv6 disabled via kernel command line) are hardly worth the mention. [↩]
"However, someone very bright decided to add binary documentation files to the Linux repository, so now I have to provide them separately:"
Maybe I am missing something here, but why would the documentation, which is specifically meant to be read by humans, be included as binary instead of text?
The binary documentation files are non-SVG images and PDFs - fits description of "binary documentation" perfectly.
> In case there is a necessity to move to older kernel, this can always be achieved by providing a corresponding vpatch.
I found myself stuck on this. My "M" kernel is written for 3.x, whereas RK requires 4.x; the latter includes the "device tree" gnarl (and much else, just about *all* iron-related code was rewritten) and consequently the diff distance b/w a 3.x and 4.x is gargantuan.
Indeed, in the future supporting multiple versions of kernel may become a necessity due to architecture requirements; however I'd like to keep the 'zoo' of actively used versions as small as possible.
Re. 3.x-4.x diff - I expect that it is huge, but I'd also bet that it's rewritten is a sense of function names/APIs changed, not that the code that does actual work (I wonder what percentage of kernel LoC falls into this category?) is different.
Which leads to the following question: is version 4.9 totally unusable for MIPS due to some fundamental breakage? Or is it just the version you (or IIRC original image provider?) picked, and moving that kernel to version 4.9 is still possible?
> not that the code that does actual work
Nope, much of the functionality in device drivers has been moved to "device tree files", and out of the code.
> is version 4.9 totally unusable for MIPS
Theoretically usable, but would have to rewrite entirely. And was reluctant to pick the considerably heavier kernel.
I see. Device tree file typically contains the configuration of SoC (voltages, peripherals configuration, where exactly to find ttys, etc), so I assume the platform configuration moved from some header to dbt file? The problem with device files is that for ARM64, supporting many devices without dtbs would be ~impossible, but if in-source support was there before - it's certainly a huge ugh.
Re kernel weight, the minimal image for x86_64 qemu produced by the procedure in this post is ~2.2Mb. The minimal kernel for MIPS should be around the same.
[...] Linux 4.9.95 was genesis'd and feeding the RNG with FG is [...]