Gales Linux is an distribution started around 2017 by JWRD Consulting (jfw and dorion) for x86_64 architecture. There is and overview of guiding principles and software selection in the introductory article on jfw's blog. I gave a try installing Gales and looking at its structure.
Bootstrap & Installation
Typically, distribution provides a ready-to-use installer, which either installs binary packages, or automates the building of those during the installation procedure. With Gales, the user is responsible for building the binary distribution components from source (bootstrapping), and later installing them manually (copying or unarchiving) into the target partition.
Bootstrap procedure happened on the Cuntoo machine. The process went through exactly as shown in build document, modulo two-three extra patches:
- First, GCC build process failed due to newer version of makeinfo (6.3) used to prepare the documentation rejected syntax of one of the .info files. The upstream fix is available, I have introduced the corresponding changes manually.
- Second, if bootstrapping on GCC 51, there is another error, with upstream fix here. Not sure whether you want this fix in the tree, but at the very least it should not harm.
Because the Cuntoo and Gales machines have similar hardware, I have reused the Cuntoo kernel config, which worked without any problems.
When the ./mkinstaller stage was reached, it turned out that the installer is a initramfs image, which would not be a problem in itself, if the installation target wasn't accessible to me only remotely, through a SystemResqueCD LiveUSB. So I looked at the list of files necessary for the installation (./installer/initramfs.list[.sh]), and scp'ed them manually.
The output of the bootstrap process are a kernel image, an archive with base root system (base-$date), and optional archives with compilers (c-$date) and ports (gports-$date).
Install happened on the machine similar to the Cuntoo one, but a bit newer, with only M.2 storage on board. This turned out to be a source of problems, because lilo as present there won't install on the NVMe devices, suggesting to use grub2 instead2. Luckily, Slackware did contain a patch for it, which I reproduce below, in the form that applied and built on top of Lilo version 24.0 of Gales (Slackware uses 24.2):
diff -Naru lilo-24.2/src/common.h lilo-24.2.new/src/common.h --- lilo-24.2/src/common.h 2015-11-21 23:50:23.000000000 +0000 +++ lilo-24.2.new/src/common.h 2018-02-15 15:13:17.411968439 +0000 @@ -386,7 +386,7 @@ extern FILE *errstd; extern FILE *pp_fd; extern char *identify; /* in identify.c */ -extern int dm_major_list[16]; +extern int dm_major_list[32]; extern int dm_major_nr; #define crc(a,b) (~crc32((a),(b),CRC_POLY1)) diff -Naru lilo-24.2/src/geometry.c lilo-24.2.new/src/geometry.c --- lilo-24.2/src/geometry.c 2015-11-21 23:50:18.000000000 +0000 +++ lilo-24.2.new/src/geometry.c 2018-02-15 16:10:25.844149725 +0000p @@ -84,8 +84,9 @@ int dm_version_nr = 0; #endif -int dm_major_list[16]; +int dm_major_list[32]; /* increased from 16 to allow for nvme disks */ int dm_major_nr; +int nvme_pr = 0; /* set to none zero after geo_init if nvme disk present */ #ifdef LCF_LVM struct lv_bmap { @@ -200,6 +201,9 @@ while(fgets(line, (sizeof line)-1, file)) { if (sscanf(line, "%d %31s\n", &major, major_name) != 2) continue; + if (strcmp(major_name, "nvme") !=0) { /* set if nvme drive is present */ + nvme_pr=-1; + } if (strcmp(major_name, "device-mapper") != 0) continue; dm_major_list[dm_major_nr] = major; if (verbose >= 3) { @@ -708,6 +712,22 @@ geo->start = hdprm.start; break; case MAJOR_SATA1: + /* check for nvme device and assume boot/this device is nvme if present */ + if (nvme_pr != 0) { + geo->device = 0x80 + last_dev(MAJOR_HD,64) + (MINOR(device) >> 4); + if (!get_all) break; + if (ioctl(fd,HDIO_GETGEO,&hdprm) < 0) + die("geo_query_dev HDIO_GETGEO (dev 0x%04x): %s",device, + strerror(errno)); + if (all && !hdprm.sectors) + die("HDIO_REQ not supported for your NVME controller. Please " + "use a DISK section"); + geo->heads = hdprm.heads; + geo->cylinders = hdprm.cylinders; + geo->sectors = hdprm.sectors; + geo->start = hdprm.start; + break; + } case MAJOR_SATA2: printf("WARNING: SATA partition in the high region (>15):\n"); printf("LILO needs the kernel in one of the first 15 SATA partitions. If \n");
Note that NVMe support here depends on the procfs being mounted. With this patch applied, lilo would happily proceeded to detect kernel and install the bootloader. In the process of building Lilo I first-handed witnessed that that the build process is indeed reproducible3-nice job!
Other than that, everything worked out according to the INSTALL instructions.
After installation and initial configuration (still inside chroot), I proceeded to install sshd - while the system's original use-case is airgapping, it was out of question in this case. The installation process was straightforward, but one wart is that the built packages do need to be signed - and I would not mind doing that - but apparently this feature is not finished yet. gpkg-install complained loudly about lack of signatures, which I had to disable with -f flag, as outlined in the PORTS document. Also, for sshd to work, /dev/pts must be mounted4.
To the build process, I have some comments: the authors enable by default a GCC flag to warn about excessive stack usage in the applications, with the goal of eliminating it somehow (switching to malloc where other approaches fail). While I did not detect anything objectable in the patches I had a look at, these patches must be double- and triple-checked, because malloc is a mechanism with non-obvious failure modes - it introduces locking and may interact in non-obvious ways with library constructors. Also, the O0-O1-O2 optimization level selection in both bootstrap and gports looks a bit arbitrary: it would be good to have a look at what -f[optimization] flags are enabled at each level, and hand-pick those that cannot cause any harm5.
Bootup and Usage
The boot time of the system is really impressive - I was greeted with a login prompt a few seconds earlier than the kernel finished initializing its subsystems.
Gales is a statically statically linked distribution, but its storage footprint is very low - 8 Mb for base filesystem, 76 Mb for tree with compilers. Syncing the gports immediately downloads ~560 Mb of sources, though.
Contrasting it with Cuntoo, the whole system is certainly much more intellectually approachable and simpler. The feel of the system is very different - daemontools instead of SYSV init, minimalistic toolset. Different file system hierarchy: after installation, you would immediately see a few more directories in the file system root6:
- /commands - binaries a moved here according to not-really clear criterion (daemontools components only so far).
- /gales - package and distribution service scripts. In more details:
- /gales/command - symlinks to package-installed binaries (i.e. foo -> /gales/pkg/$pkg-$major.$minor-$heathen_version/bin/foo). Both packages from base archive and ports end up here, so with a stretch it can be said that Gales follows Linux way, not BSD way here. A stretch, because the package manager is so simple, that it's hard to say that it "manages" anything.
- /gales/pkg - contains packages installed in the hierarchy of one directory per package.
- /packages - sources of daemontools only so far? If this directory is for source-built components, why not have init(1) source there?
- /services - services to be launched by daemontools
- /usr - a symlink to /
A feature that I liked a lot is that shell is the only scripting language in the default install of the distribution. Typically perl and python get pulled in unconditionally as a build dependency of a runtime dependency of some rarely-used default-installed utility, or are directly used to implement package manager, etc. With Gales, a decision about what scripting language to use can be made without constraints created by ready availability of python or others.
The lack of man pager in the default installation7 is a strange decision. While mandoc is certainly nicer than mandb, I would definitely see fixing it as one of priorities - using 'man -l full-path-to-man' is just WTF - in this case could just convert all docs to text, and use 'more' to read them.
The decision to install configuration files in /etc/examples is good one, the problems it can create should only manifest on the distributions with extensive software selection and regular unconditional functionality-breaking upgrades.
In general, I can well see where this system is targeted, and why it must have worked out nicely so far: with minimal amount of software installed, whole system can stay comprehensible, suitable for security-sensitive computing. OTOH, I wonder if things like Apache or imagemagick get installed, how will the package management system work out, and how comprehensible will system stay? The ports documents already point out the installation order for some of the libraries and applications, and with larger software, this may become inconvenient.
- I run a half of bootstrapping procedure on GCC 5 Debian 9 machine, at the point of discovering this issue I applied the fix, but restarted everything on the Cuntoo machine. [↩]
- Even given that the BIOS does support "legacy" booting off M.2 drives! [↩]
- There was some confusion about which version of lilo got moved where, which got resolved by purging all lilo's with fire and rebuilding it, giving as a result a file that matched one of the earlier hashes. [↩]
- Or /dev nodes manually created? There is both /dev/ptmx and /dev/pts/ptmx, which hints that this mountpoint may not be necessary. [↩]
- This will likely require a very deep dive into GCC, so may be a not very practical option, given GCC's size. [↩]
- I think writing something similar to hier(7) would be beneficial [↩]
- There is mandoc in gports tree, but as the README helpfully informs, it is not working nicely with Gales directory hierarchy. [↩]
Thanks for your general thoroughness as always and the GCC and LILO patches in particular, and congrats on getting it built and installed !
Regular unconditional functionality-breaking "upgrades" reads like a problem in itself.
I think the way both Jacob and myself are viewing Gales atm is it got us this far, primarily on his own steam, and is now a stepping stone to TMSR OS, under which every bit of software is under a V-tree. Gales' simplicity and shell as only scripting lang on base system is a good start, but will for sure have to change as "everything under V" is better understood and specified.
As far as some of the design choices you have questions/comments about, I'll defer to Jacob.
> the O0-O1-O2 optimization level selection in both bootstrap and gports looks a bit arbitrary
Officially, GCC "optimization levels" come with a list of passes enabled by each level. The question is indeed how up-to-date is that list and whether there are any passes that could break program semantics. Unfortunately, determining this is a project in itself, and it would require a GCC expert, or at least someone with the guts and patience to dive into the codebase.
Also, AFAIK anything up to and including -O2 should "be safe", in the sense that the undefined behaviours are deterministic enough so that the compiler can build e.g. the Linux kernel without causing any breakage between e.g. assembly and C code, or between various interfaces. -O3 however will optimize aggressively, to the point that it can break stuff such as threading and structure layout between libraries.
Thanks for the testing & review!
The makeinfo thing just looks like a stylistic warning; do you know why it broke the build? But a larger problem is that the output of the bootstrap is supposed to be independent of the original host, which means any generated files need to be generated by the included tools. (Even if we patch to support however many makeinfo versions, I expect they'll produce slightly different output.) Texinfo could be added to the build tools, but requires Perl which I'm not seeing as justifiable for this. So, better to patch the build system to stop trying to rebuild the .info files, and use the shipped ones as happens when the host lacks makeinfo, unless or until someone finds a better way.
The gnu_inline problem is caused by change of default standard in GCC 5 if I'm reading correctly. The stronger fix then would seem to be adding -std=c89 to the host compiler CFLAGS (BUILD sections 3.3 and 3.5).
I'd add the skeleton-.sh and gales-util-.sh shell archives to that list. The gports tarball step is something I'd remove; it's properly a subtree of the overall Gales repository and better to have the whole thing.
I'm not up enough on Lilo to judge that NVMe patch offhand but glad to hear it works. Seems odd that it needs to be quite such a special case but Lilo does seem to have a lot of that... how to put it... need for reassurance that things are OK and it can do as told.
Yep; for the curious, this and some other cleanups are found in base/patches/lilo-24.0-cross.patch in the repository archive.
From http://logs.ossasepia.com/log/ossasepia/2020-01-12#1014924 : "certainly a documentation TODO, but safe to skip as dorion_road said. Since for better or worse the gports system works by producing intermediate package files, I figured there'd better be a way to sign & verify them. The step in question is for if you want to include a public key for that in install media."
The signing mechanism uses GPG detached signatures alongside the package file and trusts pubkeys in /etc/gpkg-wot/. IIRC it does work; I don't find myself using it, though I don't have the kind of large fleets that might benefit these days. One thought is that the perceived need for binary packages in the first place is an indication that the build systems are doing way too much.
The /dev/pts mount point is included in the example fstab. Other distros tend to mount it from init scripts before fstab is even consulted. It's also necessary for any sort of screen/tmux/xterm. See pts(4). Not sure what's up with the redundant /dev/ptmx and /dev/pts/ptmx ; I'm seeing the same on Gentoo.
I'm tending to think this was a doomed effort; as you point out there's no reliable replacement for alloca / VLAs in some cases, and their use is rampant otherwise, so the compiler had better make sure that guard pages actually work - which presently it does not. (Some background.)
I think there's some inherent arbitrariness here. How would you propose to decide that "cannot cause any harm", and to what? Besides weighing the odds of compiler bugs, there's code with technically undefined behavior by strict interpretation of standards but that was fine in practice until compilers reached a certain threshold of squeezing blood from rocks. My general approach is to use -O1 by default and -O2 for performance-sensitive things, unless there are suspected problems or it's a particularly security-sensitive thing.
/package, /command and /service are direct from DJB; that's how daemontools is installed and the distributor is discouraged from breaking cross-system path compatibility. There's a number of other packages that picked up the convention, though not included here. /gales/pkg and the others are my variation on the ideas, which I use for new things and porting FHS-style software. Agreed that documenting the hierarchy would be nice.
I don't find the base installation especially habitable anyway. The idea was just enough cross-compiling to get a system that installs and boots to a freestanding environment suitable for further builds. Perhaps another instance of a Linux rather than BSD approach. That said, now that I've tried mandoc it does seem reasonably simple and a good candidate for adding to the bootstrap. I did work out a better workaround for the man path:
Then assuming man-pages and man-pages-posix are installed, move their entries to the end of the file so more specific manuals are not shadowed by generic ones (such as for example "man" itself).
Hm. Do you mean you see no value in the whole troff formatting system outside the search aspect of "man"? Granted I'd much rather write in HTML or practically any other format... but the existing content is there and I figured it was worthwhile to support.
Further automation is certainly possible, as seen with apt-get on top of dpkg or yum on rpm. Unclear to me that it's desirable just yet, if we can move things in the direction of self-contained V trees. What do you see as particularly onerous about Apache or imagemagick? I did get nginx working fine after all (although I now hear it's uncool), and with a somewhat stripped-down PHP too.
You're welcome.
Looks like makeinfo treats the warning as a critical error. I think that not rebuilding the docs is indeed a better solution.
It seems from the patch that Lilo is having troubles with some classes of devices (MAJOR_SATA1, MAJOR_SATA2), and such errors can appear not only with NVMe devices.
I see, apparently the code for verification exists but there was no documentation for signing.
Hm, I wonder what went wrong there then. It is indeed in /etc/examples/fstab - either I fat-fingered and removed that line when editing /etc/fstab, or something else failed.
The problem is also that it's not only about VLAs, but also about large on-stack allocations - which also calls for correctly working guard pages.
Well, there are some inlining-related passes that would be interesting to investigate for potential issues with UB and to enable later (inlining is typically responsible for biggest speedups), while stuff like -fdelete-null-pointer-checks looks like an absolute no-go. Also, IIRC GNAT adds several other passes and options (something like -fpreserve-control-flow) that might be useful.
Not sure about bootstrap, but a script to regenerate the mandoc config after installation, to be invoked only manually, would be a useful addition. I have tested your snippet and it works fine.
The comment was to the fact that if default man pages installation is broken, then docs must be shipped in a readable form anyways. With working mandoc, I see no issue with the troff format.
IMO dependency management is a big can of worms that is better squashed without opening, so finding some way different from "more automation" should be a goal. Apache and imagemagick were used as packages with rather extensive set of dependencies.
@spyked:
Linux comes with a set of flags in the Makefiles that make it safe to build kernel with newer GCC - otherwise it would've been broken with a very high probability. And yes, O3 is known to produce code even worse than O2.
[...] and now, because while I was away for holidays I had the opportunity to give a first-pass read both bvt's install report as well as some of jfw's writings on the topic. My earlier concerns about UEFI were validated once [...]
[...] time being -- and other recent postings from the L1. I've still Lobbes Portage exploration, Bvt's Gales exploration and Billymg's MP-WP patch viewer discussion to (re)read, but getting back to stuff I've done so [...]
[...] December 9th, bvt committed to giving Gales a test run and published his Gales Installation Report December [...]
[...] of unnecessary scripting language dependencies in the base Gales system, noted by bvt in his installation report. On the question of how practical the system would be for wrangling packages with large dependency [...]
[...] for themselves; more precisely, JFW's introduction to Gales Linux; the initial release; and Bvt's installation report are so far the foremost pieces on the subject, as far as I'm [...]
[...] Gales is the thing to break if you want to make Jacob cross2 and otherwise a fully static linux distribution of delightfully small size and clear setup that works reportedly ~everywhere, from Panama to the tar pit of many virtual machines and the backtrace of various network installations. [...]
[...] the pepper farmers and invest the proceeds expanding the salt farms. [↩]Reviewed to date by bvt (WoT : bvt) and Lucian Mogosanu (WoT : spyked) and on which Mircea Popescu (WoT : mircea_popescu) [...]