Archive for the ‘Linux’ Category

Mes, Part 2: Bootstrapping Guix

Monday, April 8th, 2019

This is a continuation of the previous investigation. Because AFAIK the link between last components of stage0 and mes/mescc is work in progress, I will describe what is going on starting from mes/mescc.

It may sound from the description of Mes1 that it is possible to bootstrap the whole Linux ecosystem using it, however this impression is a wrong one. Due to proliferation of autotools, Make, and so on, you'd need bunch of additional software to build almost any GNU application. In fact, producing a C compiler capable of compiling tcc, while an impressive feat due to retardation of C language, is just a smaller component of GuixSD bootstapping process.

GuixSD is the distribution that uses Guix as the package manager. Guix is a package/system manager for Linux/Hurd, a variation on ideas from Nix on GNU substrate (Guile Scheme instead of custom DSL, focus on 'libre' software). In essence, Nix is an attempt to fix some aspects of the Linux complexity explosion by implementing a scheme mentioned in the logs at the level of packages: all packages are installed as separate subtrees in /gnu/store, binaries/libraries refer to their dependencies in /gnu/store as well. The path to the package includes a hash that is calculated over dependencies and sources of the package. To give you a taste of what the file system tree looks like:

# echo $PATH
/gnu/store/kjmqpchkfy8rvv19jvd14q40shm9fk2p-profile/bin:/gnu/store/kjmqpchkfy8rvv19jvd14q40shm9fk2p-profile/sbin:/run/setuid-programs:/root/.config/guix/current/bin:/root/.guix-profile/bin:/run/current-system/profile/bin:/run/current-system/profile/sbin
# ls -alh /gnu/store/kjmqpchkfy8rvv19jvd14q40shm9fk2p-profile/bin | head
total 0
dr-xr-xr-x 2 root root 8.3K Jan  1  1970 ./
dr-xr-xr-x 9 root root  240 Jan  1  1970 ../
lrwxrwxrwx 2 root root   64 Jan  1  1970 [ -> /gnu/store/5s2nib1lrd2101bbrivcl17kjx1mspw6-coreutils-8.30/bin/[
lrwxrwxrwx 2 root root   71 Jan  1  1970 aclocal -> /gnu/store/k7gymsw2xfp20fv30x5niilwnxpj2d2k-automake-1.16.1/bin/aclocal
lrwxrwxrwx 2 root root   76 Jan  1  1970 aclocal-1.16 -> /gnu/store/k7gymsw2xfp20fv30x5niilwnxpj2d2k-automake-1.16.1/bin/aclocal-1.16
lrwxrwxrwx 2 root root   71 Jan  1  1970 acyclic -> /gnu/store/hw4h30a6hgza5fr2pdaz69bnqyh6r0cb-graphviz-2.40.1/bin/acyclic
lrwxrwxrwx 2 root root   73 Jan  1  1970 addr2line -> /gnu/store/02iklp4swqs0ipxhg5x9b2shmj6b30h1-binutils-2.31.1/bin/addr2line
lrwxrwxrwx 2 root root   66 Jan  1  1970 ar -> /gnu/store/02iklp4swqs0ipxhg5x9b2shmj6b30h1-binutils-2.31.1/bin/ar
lrwxrwxrwx 2 root root   66 Jan  1  1970 as -> /gnu/store/02iklp4swqs0ipxhg5x9b2shmj6b30h1-binutils-2.31.1/bin/as
... (and so on)

Properly speaking, Mes has two components: mes and mescc. Mes is an interpreter of a useful subset of Scheme, on top of which C compiler can be built. It is shipped with sxpath, srfi-1, -16, -26 packages, and provides a LALR(1) parser generator. The C implementation is of comparably low complexity2 :

# wc *
   256    825   6160 gc.c
   237    887   5833 hash.c
   456   1687  11193 lib.c
   219    728   4315 math.c
  2731   8157  73990 mes.c
   125    514   3568 module.c
   387   1129   7714 posix.c
   542   1892  12371 reader.c
   299   1037   6887 strings.c
    83    336   2062 struct.c
   114    404   2372 vector.c
  5449  17596 136465 total

Mescc is an actual C compiler that transforms C source code into M1 macroassembly. The exact set of features it supports is not documented anywhere and I did not want to spend time on investigating this, but the claim is that it is capable of building bash and tar (most likely, older versions of those).

However, mes and mescc are not that interesting as freestanding pieces of software, but the bootstrapping procedure built around them.

Let's have a look how bootstrapping process works. For this, I used a GuixSD LiveUSB, as getting Guix to run on a toiletbox machine proved to be nontrivial: Guix requires rather new Guile version, which I couldn't get from distro's repositories. I took the general instructions for experimental part from here.

# git clone git://git.savannah.nongnu.org/guix
# cd guix
# git checkout core-updates
# pkill -9 guix-daemon
# guix environment guix
# ./bootstrap
# ./configure --localstatedir=/mnt/var/guix/var/
# make

First, we clone the repository, and switch it to core-updates branch: master branch ships prebuilt GCC, thus making no useful attempt at bootstrapping; Mes is used on core-updates branch; and there is a broken wip-bootstrap branch with some ongoing work (reimplementing a bash/busybox in Scheme). I killed the guix-daemon because later I will launch the one built in cloned repository. The next command sets up the environment to contain all the dependencies necessary for building Guix. Next three command build all necessary Guix binaries.

According to the documentation, now bootstrap can happen:

# ./pre-inst-env ./guix-daemon --build-users-group=guixbuild &!
# ./pre-inst-env guix build --system=i686-linux hello

Guix will download all the packages necessary for the build, unpack them, and after ~one hour, GCC, binutils and other software will be bootstrapped, and finally GNU Hello, a well-known example of a terrible programming style , will be built.

The bootstrapping process is driven by two files from Guix source:

  • bootstrap.scm - seeds the environment with static binaries.
  • commencement.scm - does the actual build of the full environment using binaries from previous step and sources from this step.

These both files contain package descriptions similar to those of portage ebuilds, but with several packages described in a single file, and using Guile instead of oddball shell/whatever mixture portage uses.

It is possible to build a graph of packages and targets that Guix uses when bootstrapping. The following command produces the tree only up to GCC build (I don't think anyone cares about GNU Hello and it's dependencies).

# ./pre-inst-env guix graph -t bag --system=i686-linux -e '(@@ (gnu packages commencement) gcc-final)' | dot -Tpng > /tmp/mes-bootstrap.png

The resulting image is reproduced in a not-really-useful compressed form below (click on the image for full version, ~2Mb).
Mes bootstrap graph
We can see that the following software is built, and some tools are rebuilt several times:

  • bash
  • binutils (2.20 -> 2.32)
  • bison
  • diffutils
  • file
  • findutils
  • gcc (2.95.3 -> 4.7 -> 4.9 -> 7.4)
  • gettext
  • glibc (2.2.5 -> 2.16 -> 2.28)
  • linux-headers
  • m4
  • make (3.80 -> 3.82 -> 4.2.1)
  • perl (!)
  • tcc (0.9.26 0 -> 0.9.27)
  • texinfo
  • zlib

The packages that have no predecessors/dependencies are:

  • bootstrap-binaries-0 - contains statically built coreutils, gawk, bash.
  • bootstrap-mes-0 - this is binary distribution of mes-0.19: statically linked Scheme interpreter mes, C compiler in Scheme mescc, mescc libc (headers + static libc + crt *.o files).
  • bootstrap-mescc-tools-0.5.2 - statically linked tool: blood-elf (debug table generator), hex2, M1, which were introduced in Part 1 of this investigation.
  • guile-bootstrap-2.0 - statically linked Guile distribution.

The Guile distribution is necessary for two wrappers around the ld: because of the directory structure of Guix, libc dynamic linker gets confused about what libraries are where, therefore at linking time additional library paths have to be passed down to the actual ld. The two scripts are fundamentally the same, and differ in the name of ld binary invoked and the the name of the wrapper itself.

So, my conclusions would be:

  • The bootstrapping process is so far incomplete in a sense of not including kernel into its view, and by still requiring static bash/coreutils. Appparently Guile can be cut out if bootstrapping a system with classical file system layout.
  • The mes/mescc is a small component in a larger, more complex Guix bootstrapping project, and are unlikely to be really useful outside of it.

If you have questions about any components from the graph, or in general, feel free to please ask.

  1. GNU Mes is a Scheme interpreter in minimal subset of C and a C compiler in minimal subset of Scheme. Mes and mescc can compile a lightly patched tinycc that is self-hosting. []
  2. For comparison, tinyscheme:

    # wc *.c
       146    356   3190 dynload.c
      5051  13561 141552 scheme.c
      5197  13917 144742 total
    

    []