booting

I have a mildly complicated boot setup that loops around to being simple.

                                sbctl secure boot
┌──────────────┐  ┌/dev/sda1─►/efi───────────────┐ ╭       ╮
│              │  │ 2. Unified Kernel Image      │ │made   │
│1. Just simple│  │ (no intermediate bootloader) │ │with   │
│laptop UEFI   ├─►│┌───────┐  ┌─────────────────┐│ │dracut.│
│(OEM default!)│  ││efistub├─►│tweaked initramfs││ ╰       ╯
│              │  │└───────┘  └─────────────────┘├─┐
└──────────────┘  └──────────────────────────────┘ │ 
                                                   │ password.
┌/dev/sda2─►/dev/mapper/cryptroot───────────────┐  │ mounts then
│ 3. LUKS2 encryption (argon2id algorithm)      │  │ switches
│┌/dev/mapper/cryptroot─►various───────────────┐│  │ root
││ 4. BTRFS partition (flat subvolume layout)  ├╯  │
││Subvolume         Mount location             │◄──┘
││@           ────► /                          ├╮
││@home       ────► /home                      ││
││@var_log    ────► /var/log                   ││
││@snapshots  ────► /.snapshots                ││
│└─────────────────────────────────────────────┘│
└───────────────────────────────────────────────┘

There was a gotcha I had to write a Dracut module for, but this setup is otherwise a typical product of dodging an intermediate bootloader.

Below, I'll discuss each step of the startup diagram in order (as opposed to the chronologically-ordered setup commands).

Boot process

1. UEFI

My ASUS laptop's UEFI firmware ⇗ happens to be standards-compliant. Hence, the UEFI can boot straight into the kernel via efistub, as long as the UEFI keeps a boot entry for it (discussed in §2).

Note: The Gentoo wiki warns ⇗ that not all motherboards are fully UEFI-compliant. Too bad: they'll need to chainload an intermediate bootloader (GRUB2, systemd-boot, etc.) instead.

The UEFI also lets me proudly use secure boot.
Why use secure boot? It's my answer to this question:

The EFI System Partition (ESP) cannot be encrypted ⇗. Then, how do you boot from an unencrypted ESP's boot application into an encrypted root, safely*? *according to your threat model

My answer to this question evolved over time. (Skip if you don't care about the thought process.)

old ◀────► new
Unencrypted /boot, normal bootloader Unencrypted ESP GRUB2 loads argon2id-encrypted /boot kernel Secure boot with Unified Kernel Image (UKI)
2.4istan ⇗ is easy. A determined attacker able to 2.4istan will probably still mess with the EFI application in /efi. No one would ever know. The kernel has no "personal information", so a good integrity-check is better than hiddenness. (Someone can still remove the CMOS ⇗ though)
Easy/lazy setup Decryption needs to be run twice ⇗:
  1. GRUB2 decrypts the partition holding /boot
  2. GRUB2 loads /boot's kernel and initramfs, then passes away
  3. The kernel decrypts the root partition (even if /boot was already on it) then does mount+switch_root
UKI with secure boot dodges needing:

In practice, app-crypt/sbctl ⇗ manages my secure boot keys.

Why sbctl?

What keys do I use?

And finally, the UEFI menu is password protected.
(Imagine enabling secure boot but letting the F2 key dodge it.)

Takeaways: At any time convenient to you (e.g. before or after your first successful boot):

  1. Add a UEFI password
  2. Put UEFI Secure Boot into setup mode ⇗, then boot
  3. Compile app-crypt/sbctl
  4. Follow the sbctl README ⇗ and manpage. If /efi is unfindable, set the ESP_PATH (issues/207 ⇗)

2. Unified Kernel Image

In a normal boot setup ⇗:

  1. The UEFI firmware performs:
  2. The bootloader finds and loads the kernel (e.g. vmlinuz ⇗) and attaches the initramfs ⇗ (a small ram rootfs; has /dev, /bin, etc.)
  3. The kernel prepares hardware modules; the ram rootfs does fsck and mounts partitions
  4. The rootfs is freed ⇗ after switch_root ⇗

With a Unified Kernel Image (UKI), all of that is bundled. Instead of a separate some_bootloader.efi, vmlinuz, and an initramfs, we have one file, e.g. /efi/EFI/Linux/hash-1.2.3-gentoo-dist.efi.

On my system:

  1. The UEFI firmware runs through:
  2. efistub (not an intermediate bootloader ⇗) loads the kernel and initramfs kept within the UKI. My UKI is made with
    1. Dracut (choice discussed later)
    2. gentoo-kernel-bin (discussed in kernel ⟹)
    3. installkernel, via the kernel-install USE flag
  3. The kernel prepares hardware modules; the kernel and initramfs decrypt my LUKS2 partition (discussed in §3) and mounts the subvolumes of the BTRFS partition within that partition onto /system_root (discussed in §4).
  4. After a default Dracut module (a shellscript) asserts that /system_root looks sane, it runs `switch_root system_root`.

Takeaways:

As a consequence ⇗ of Dracut, I need systemd-utils*. Otherwise, follow the Gentoo wiki ⇗, but install uefi-mkconfig instead of kernel-bootcfg. Then reinstall the Linux kernel.

*Alternatively, you could probably hand-roll your own UKI to dodge the Dracut/kernel-install dep.
/etc/portage/package.use/kernel
sys-kernel/installkernel uki efistub dracut
sys-apps/systemd-utils boot kernel-install
/etc/portage/package.accept_keywords/uefi-mkconfig
sys-boot/uefi-mkconfig ~amd64

3. LUKS2 ENCRYPTION

I use Linux Unified Key Setup (LUKS2) to wrap my BTRFS drive (discussed in §4) with encryption. For safety, I back up ⇗ the LUKS header offsite.

Refresher: What's LUKS? Skip if familiar.

/dev/sda is a block device, a special file that gives you access to some hardware "device" like an NVMe drive. Arch Wiki ⇗

Anything in /dev/mapper is a "virtual" block device -- it's a kind of middleman between you and the actual hardware block device. This is handled by the kernel "device mapper" subsystem. Wikipedia ⇗

When you run cryptsetup, you call to the dm-crypt ⇗ system, which uses the kernel "device mapper" and kernel crypto API to:

  1. luksFormat: Put a header (the LUKS header) onto the block device to describe some metadata. ArchWiki ⇗
  2. luksOpen: Decrypt the block device (via password or keyfile), then map it to some virtual block device in /dev/mapper.

This facilitates transparent (live) encryption:

    ╭────────────╮               ╭───────────╮
    │   mapped   │               │  mounted  │
    ┴            ▼               ┴           ▼
/dev/sda2      /dev/mapper/something       /home/somebody
    ▲            ┬               ▲           ┬       ▲
    │write actual│ translate msg │msg sent to│       │
    │crypted data│ w/ encryption │ middleman │    You write
    ╰────────────╯               ╰───────────╯    to a file

In order to boot with this setup, the initramfs must know:

  1. that we need to decrypt and map /dev/sda2 into /dev/mapper/cryptroot
  2. where to mount /dev/mapper/cryptroot's volumes into

Since I use Dracut for the initramfs, I used the Dracut section for disk encryption in the Gentoo wiki ⇗. I also added rd.luks.name (to mount as /dev/mapper/cryptroot) and rd.luks=1 for convenience. (See the Arch wiki ⇗ and dracut(8) manpage ⇗.)

Takeaways:

/etc/dracut.conf
# part 1: enable cryptsetup
add_dracutmodules+=" dm crypt "
kernel_cmdline+=" rd.luks.uuid=ad5e66e7-e890-4565-95f1-37f27600c8d4 \
rd.luks.name=ad5e66e7-e890-4565-95f1-37f27600c8d4=cryptroot \
root=UUID=310bf892-743d-4e57-b48d-4ccaa4265416 rd.luks=1 "
# Don't copy my UUIDs in the /etc/dracut.conf kernel command line!
# Find and specify your own for rd.luks.uuid=X and root=UUID=X
sudo lsblk -o name,fstype,uuid,mountpoints
NAME          FSTYPE      UUID                                 MOUNTPOINTS
sda
├─sda1        vfat        5085-8014                            /efi
└─sda2        crypto_LUKS ad5e66e7-e890-4565-95f1-37f27600c8d4
  └─cryptroot btrfs       310bf892-743d-4e57-b48d-4ccaa4265416

4. BTRFS

BetterFS is a fantastic filesystem for desktops.

Why BtrFS is better (■-■¬) Why BtrFS is worse (◞‸◟;)
ext4 BtrFS supports transparent compression ⇗ often doubling de facto disk space and reducing writes. Benchmarks ⇗

(compression is my primary use case)

BtrFS subvolumes make incremental snapshotting a cakewalk. BtrFS docs ⇗

Though, actual backups are more danger-resistant.

ext4 is super battletested (though, Facebook uses BtrFS. article ⇗)
xfs xfs is more for parallel workloads at scale ⇗ so idk.
zfs BtrFS uses less RAM ⇗ (my ZFS dealbreaker) BtrFS is in-tree and easier to set up idk. ask a freeBSD user

I use a "flat" subvolume layout. Sysadmin guide ⇗.

As an example, here's an approximation of how my BtrFS setup was made. It doesn't include the earlier steps, i.e. cryptsetup, mkfs.btrfs, and emergence of relevant packages such as btrfs-progs:

# Something I find odd: you need to mount the main
  btrfs filesystem before making subvols — instead of on
  the device, you make subvols on the mountpoint. e.g.
mount -t btrfs /dev/mapper/$cryptdevice \
  --mkdir /mnt

btrfs subvolume create /mnt/@
btrfs subvolume create /mnt/@
btrfs sub c /mnt/@home
btrfs sub c /mnt/@var_log
btrfs sub c /mnt/@snapshots
umount /mnt

# I use noatime ⇗ to stop disk writes that
  I don't care about (i.e. access time writes)
mount -t btrfs -o \
  noatime,compress=zstd,subvol=@ \
  /dev/mapper/$cryptdevice \
  --mkdir /mnt
mount -t btrfs -o \
  noatime,compress=zstd,subvol=@home \
  /dev/mapper/$cryptdevice
  --mkdir /mnt/home
mount -t btrfs -o \
  noatime,compress=zstd,subvol=@var_log \
  /dev/mapper/cryptroot --mkdir /mnt/var/log
mount -t btrfs -o \
  noatime,compress=zstd,subvol=@snapshots \
  /dev/mapper/$cryptdevice
  --mkdir /mnt/.snapshots

# Boot sector
mount /dev/$sda1_probably --mkdir /mnt/efi

Unfortunately, due to the flat subvolume layout shown above, Dracut doesn't think that /system_root appears sane during boot time.

I haven't run into this before, but I trial-and-errored in the sources and wrote my own Dracut module to fix it. (Dracut modules are just shell scripts with four or so expected functions.) And of course, to keep it managed by my package manager, I added it as a Portage patch:

Takeaways:

A flat BtrFS subvolume layout might need finagling:

/etc/dracut.conf
add_dracutmodules+=" actually-normal-fstab "
use_fstab="yes"
add_fstab+=" /etc/fstab "
/etc/portage/patches/sys-kernel/dracut/actually-normal-fstab-module.patch
diff --git a/modules.d/99actually-normal-fstab/module-setup.sh b/modules.d/99actually-normal-fstab/module-setup.sh
new file mode 100755
index 00000000..a9607e19
--- /dev/null
+++ b/modules.d/99actually-normal-fstab/module-setup.sh
@@ -0,0 +1,15 @@
+#!/bin/sh
+
+# called by dracut
+check() {
+    return 0
+}
+
+# called by dracut
+depends() {
+    echo fs-lib
+}
+
+install() {
+    inst_hook mount 99 "$moddir/mount-normal-fstab.sh"
+}
diff --git a/modules.d/99actually-normal-fstab/mount-normal-fstab.sh b/modules.d/99actually-normal-fstab/mount-normal-fstab.sh
new file mode 100755
index 00000000..8b6358f5
--- /dev/null
+++ b/modules.d/99actually-normal-fstab/mount-normal-fstab.sh
@@ -0,0 +1,34 @@
+#!/bin/sh
+
+set -x
+
+type getarg > /dev/null 2>&1 || . /lib/dracut-lib.sh
+type det_fs > /dev/null 2>&1 || . /lib/fs-lib.sh
+
+fstab_mount() {
+    test -e "$1" || return 1
+    info "Mounting from $1"
+    # Don't use --target-prefix since it will fail to mount '/' (already 'mounted')
+
+    sed -e '/\t\/efi\S*/d' -e '/btrfs defaults 0 0/d' -e 's/\t\//\t\/sysroot\//' "$1" > /tmp/fstab
+    mount --all --fstab /tmp/fstab
+    return 0
+}
+
+# systemd will mount and run fsck from /etc/fstab and we don't want to
+# run into a race condition.
+if [ -z "$DRACUT_SYSTEMD" ]; then
+    [ -f /etc/fstab ] && fstab_mount /etc/fstab
+fi
+
+# prefer $NEWROOT/etc/fstab.sys over local /etc/fstab.sys
+if [ -f "$NEWROOT"/etc/fstab ]; then
+    fstab_mount "$NEWROOT"/etc/fstab
+elif [ -f "$NEWROOT"/\@/etc/fstab ]; then
+    # in case of btrfs flat volumes, where root is a "@" subvol
+    fstab_mount "$NEWROOT"/\@/etc/fstab
+elif [ -f /etc/fstab ]; then
+    fstab_mount /etc/fstab
+fi
+
+set +x

Conclusion

I hope this isn't a Gentoo-specific problem that let me use Gentoo-specific tools to solve a Gentoo-specific problem.

But if it is, I don't regret it.

Maybe I should have used µgRD ⇗ for boot setup instead? It's specifically designed for Gentoo users while also being reasonably cross-distribution.

Anyway, that's my Unified Kernel Image-based secure boot setup. You can do this on any distro, really. Unless your motherboard is wack.