Linux Portability, Part 1: Exploring musl Architecture-Specific Headers

October 31st, 2018

mircea_popescu: i suspect noboduy's getting out of cataloguing the shit that easily, but we see.

First, summarize a thread.

Initially, I had no intention to take part or participate in any further discussions on this topic until this article is ready. However, discussion did happen in the logs, and now I can't proceed without providing the summary.

I pointed out that flags/syscall identifiers map to different values across architectures. Maintaining mappings from #defines to values is a daily work of libc maintainers, and there is no reason why this work can't be done for Ada on the subset of system calls and flags that the sanely written software requires.

To this topic, asciilifeform made extensive commentary:

  • He was aware that the differences are present, and did not expect any standardization there.
  • Platform-specific code is inacceptable in user applications, and belongs to GNAT runtime.
  • A task of rewriting C library would be a wasted effort, because time-people resource expenditure would be too large, and the practical approach is to rely on the standard C code.1.
  • In spite of this, he would love to see his udptron run on top of ave1's zero-cost-runtime2.
  • All platform specific constants are unwelcome regardless of whether they are C or Ada, so amount of such code must be minimized.
  • It would be useful to document the kernel ABI, and not the C stuff.
  • Rest of this article was supposed to expose gnarly parts of kernel ABI, without assumption that the whole world is x86. However, after initial investigation, it turned out that I overestimated my knowledge of non-x86 architectures, so I proceeded to read musl source as documentation; also, I discovered that just dumping the information that I have would be too unstructured for my taste. Instead I offer you something that approximates "let's read architecture-specific parts of musl source together" (but no actual musl source is shown in the article).

    I use musl's source as Linux ABI reference because it is cleanly written, without any clever tricks that are hard to understand, and I already have some experience working with it, albeit only for x86_64. Fishing out the same information from Linux3 directly would take much more time. Naturally, using libc as kernel ABI documentation gives zero information about platforms which musl does not support (Itanium, SPARC).

    In the Part 1, we will have a look at architecture-specific headers, which contain structure (re)definitions, and various defines (system call table, system call flags, signals, etc.). It turned out rather boring comparing to Part 2, where we will explore various ifdefs in the C sources. There will also be a Part 3, a summary, which is supposed to be the actual useful thing -- the first two parts are just a work log.

    To sort out some things from the start on, I will mostly ignore all the things that are platform-specific by design, i.e. structures with register sets, definitions of atomic operations, even though I will state in which files they are present.

    Yes, most of the article is in the annotations.

    ~/src/musl/arch $ find . -type f  | awk -F/ 'BEGIN {} {t[$NF] = t[$NF] " " $2} END{for (h in t) { print h ":\t" t[h] } }'
    # architecture-specific instruction definitions
    atomic_arch.h:	i386 or1k x32 sh mips64 aarch64 arm powerpc x86_64 mips mipsn32 m68k microblaze s390x powerpc644
    syscall_arch.h:	i386 or1k x32 sh mips64 aarch64 arm powerpc x86_64 mips mipsn32 m68k microblaze s390x powerpc645
    pthread_arch.h:	i386 or1k x32 sh mips64 aarch64 arm powerpc x86_64 mips mipsn32 m68k microblaze s390x powerpc646
    
    # architecture-specific structures (register structures, etc.) and definitions
    reg.h:		i386 or1k x32 mips64 aarch64 arm powerpc x86_64 mips mipsn32 m68k microblaze s390x powerpc647
    float.h:	i386 or1k x32 sh mips64 aarch64 arm powerpc x86_64 mips mipsn32 m68k microblaze s390x powerpc648
    setjmp.h:	i386 or1k x32 sh mips64 aarch64 arm powerpc x86_64 mips mipsn32 m68k microblaze s390x powerpc649
    alltypes.h.in:	i386 or1k x32 sh mips64 aarch64 arm powerpc x86_64 mips mipsn32 m68k microblaze s390x powerpc6410
    syscall.h.in:	i386 or1k x32 sh mips64 aarch64 arm powerpc x86_64 mips mipsn32 m68k microblaze s390x powerpc6411
    user.h:		i386 or1k x32 sh mips64 aarch64 arm powerpc x86_64 mips mipsn32 m68k microblaze s390x powerpc6412
    limits.h:	i386 or1k x32 sh mips64 aarch64 arm powerpc x86_64 mips mipsn32 m68k microblaze s390x powerpc6413
    crt_arch.h:	i386 or1k x32 sh mips64 aarch64 arm powerpc x86_64 mips mipsn32 m68k microblaze s390x powerpc6414
    stdint.h:	i386 or1k x32 sh mips64 aarch64 arm powerpc x86_64 mips mipsn32 m68k microblaze s390x powerpc6415
    posix.h:	i386 or1k x32 sh mips64 aarch64 arm powerpc x86_64 mips mipsn32 m68k microblaze s390x powerpc6416
    reloc.h:	i386 or1k x32 sh mips64 aarch64 arm powerpc x86_64 mips mipsn32 m68k microblaze s390x powerpc6417
    signal.h:	i386 or1k x32 sh mips64 aarch64 arm powerpc x86_64 mips mipsn32 m68k microblaze s390x powerpc6418
    endian.h:	i386 or1k x32 sh mips64 aarch64 arm powerpc x86_64 mips mipsn32 m68k microblaze s390x powerpc6419
    stat.h:		i386 or1k x32 sh mips64 aarch64 arm powerpc x86_64 mips mipsn32 m68k microblaze s390x powerpc6420
    # sigaction kernel interface in  src/internal/ksigaction.h
    ksigaction.h:	x32 sh mips64 x86_64 mips mipsn3221
    
    ioctl_fix.h:	arm generic s390x22
    errno.h:	mips64 powerpc generic mips mipsn32 powerpc6423
    ioctl.h:	sh mips64 powerpc generic mips mipsn32 powerpc6424
    termios.h:	mips64 powerpc generic mips mipsn32 powerpc6425
    mman.h:		i386 x32 mips64 powerpc generic x86_64 mips mipsn32 powerpc6426
    hwcap.h:	sh mips64 aarch64 arm powerpc generic mips mipsn32 s390x powerpc6427
    statfs.h:	x32 mips64 generic mips mipsn32 s390x28
    shm.h:		x32 sh mips64 aarch64 powerpc generic x86_64 mips mipsn32 s390x powerpc6429
    ptrace.h:	i386 x32 sh mips64 arm powerpc generic x86_64 mips mipsn32 m68k s390x powerpc6430
    sem.h:		mips64 aarch64 powerpc generic mips mipsn32 s390x powerpc6431
    fcntl.h:	x32 mips64 aarch64 arm powerpc generic x86_64 mips mipsn32 m68k s390x powerpc6432
    socket.h:	x32 mips64 aarch64 powerpc generic x86_64 mips mipsn32 s390x powerpc6433
    msg.h:		or1k x32 mips64 aarch64 powerpc generic x86_64 mips mipsn32 s390x powerpc64 34
    fenv.h:		i386 x32 sh mips64 aarch64 arm powerpc generic x86_64 mips mipsn32 m68k s390x powerpc6435
    ipc.h:		or1k x32 mips64 aarch64 powerpc generic x86_64 s390x powerpc6436
    link.h:		generic s390x37
    io.h:		i386 x32 generic x86_6438
    resource.h:	mips64 generic mips mipsn3239
    poll.h:		mips64 generic mips mipsn3240
    # Portable headers
    kd.h:		 generic
    soundcard.h:	 generic
    vt.h:		 generic
    
    1. I disagree with this: select a subset that sane applications need and stick to it, should be not that hard. Of course, only practice can show this for sure. []
    2. As far as I understand at this point, not because it makes any difference from practical point of view, and not because he wants to get rid of C code, but so that someone explained how networking syscall interfaces work? []
    3. This approach would have a benefit of not being a commentary of a commentary. []
    4. The name of file says it all. Definitions of atomic instructions per each architecture. []
    5. Definitions of system call invocation instructions, but notably, contains fixups: all mipses -- for stat(2) kernel interface, arm -- for thumb2 code generation, sh -- for hardware bug, x32 -- for struct timespec (kernel uses long long integer for nanosecond field), s390x -- for SYS_mmap (all mmap(2) arguments are passed through stack). []
    6. Definitions for getting thread control block pointer, adjusting thread pointer when a new thread is created, and setting Program Counter for thread cancellation. []
    7. Register name to number mapping. []
    8. Constants specific to floating-point hardware []
    9. Definition of type for setjmp/longjmp buffer. []
    10. Some type definitions (time_t, suseconds_t, wchar_t, wint_t, all pthread synchronization structures, _Addr, _Int64, _Reg). []
    11. System call table, these differ per architecture. or1k and aarch64 have ~same system call tables, ditto mips64 and mipsn32 for first ~200 system calls (exotica differts). Rest contain more or less significant differences. []
    12. Extra ptrace definitions for gdb. Architecture-specific by designs, as it contains structures with register sets and related definitions. []
    13. Definitions of LONG_MAX, LLONG_MAX. []
    14. Architecture-specific part of _start symbol definition. []
    15. Definitions of 'architecture-independent' types and limits. Ironically, the definitions themselves are architecture-dependent. []
    16. POSIX platform types (_POSIX_V7_LP64_OFF64, _POSIX_V7_ILP32_OFFBIG, etc.). []
    17. Definitions for dynamic linking. []
    18. Definitions for signals. Signals on POSIX systems are an extremely ugly abomination, you should not touch if you care about your sanity. Most important signal codes match across architectures, others not (half of signal mappings on mipses). And yes, signal context structures contain registers and are platform-specific. All architectures but mipses define 65 signals, mipses -- 128. []
    19. Defines platform endianness. []
    20. st_ino and st_nlink fields of the struct stat differ widely across architectures, other difference is padding, which is also platform-specific. []
    21. Defines kernel sigaction structure -- it is different from what is exposed to userspace. []
    22. Very minor differences, but it's hard to judge if the list is conclusive. ioctl codes are typically exposed by the kernel headers, not by the libc. []
    23. Mipses have different codes above ~35, powerpc has a single different error code. []
    24. In general, mipses all share ioctl code, powerpc's as well. Hard to tell anything more conclusive -- yes, the codes differ, and the source files are ~undiffable. []
    25. Mipses have different codes (little experience on this part, so can't tell is important flags are affected, but the difference between files looks significant). powerpc has a different structure termios layout as well as flag definitions. []
    26. x86 only adds some flags, mips has different flags (MAP_ANON, MAP_EXECUTABLE), powerpc redefines some of them (MAP_NORESERVE, MAP_LOCKED, and all mlock flags). []
    27. All is architecture-specific here, just some arches have nothing to report. File contains definitions of hardware features. []
    28. mips has different field order in the struct statfs, s390 uses different flags, x32 adds padding. []
    29. This file contains definition of struct shmid_ds. arm64, mips, s390, x32, x86 modify padding; powerpc has different field order. sh changes an SHMLBA definition (memory sharing happens at granularity of 16384 bytes). []
    30. Contains architecture-specific defines to access internal CPU-specific state. []
    31. Big padding differences across architectures in struct semid_ds. Fields are all in the same order, though. []
    32. Mipses have big differences in most basic flags. []
    33. Most architectures have differences wrt. struct msghdr padding and endianness, but specifically mipses -- over setsockopt flags, and even socket types (SOCK_STREAM, SOCK_DGRAM), powerpc -- low importance setsockopt flags are different, and s390 has one setsockopt flag missing (unless it's a bug). []
    34. Only padding issues across architectures (struct msqid_ds). []
    35. Architecture-specific header with definitions related to floating point unit control. []
    36. struct ipc_perm padding, IPC_64 definition is defined for some architectures (makes kernel use struct msqid_ds definition with larger fields). []
    37. Symbol table type redefinition: switches Elf_Symndx from uint32_t to uint64_t. []
    38. Definitions for x86-specific port IO instructions. []
    39. MIPS rlimit flag order is also different. Most of the applications that user writes don't set rlimit, though. []
    40. MIPS has arch-specific poll.h flags, but only rarely used ones differ. []

    Leave a Reply