A small release with one slightly embarrassing kernel bug, two latent crashes the test harness coughed up, and an end-to-end test that finally passes ten times in a row.
The Escape key never worked
Pressing Escape did nothing. Not in stsh, not in Lumen, not in the GUI installer’s wizard back button, not anywhere. I never noticed because nothing I personally use the kernel for required Esc — but the GUI installer’s back arrow has been broken since Phase 47, and that was my excuse to find out why.
kernel/arch/x86_64/kbd.c, scancode set 1, lookup table:
static const char s_sc_lower[] = {
0, 0, '1', '2', '3', '4', '5', '6', /* 0x00–0x07 */
...
};
Index 0x01 is the Escape make code. It was 0 (NUL). The translation loop has if (c) guarding the buf_push — NUL gets dropped silently. So every Escape keypress in Aegis history was thrown away inside the keyboard handler, before any userspace process ever saw it. Two-character fix:
static const char s_sc_lower[] = {
0, '\033', '1', '2', '3', '4', '5', '6',
...
};
The same hole in s_sc_upper is patched too. Single-character oversight, present since the PS/2 driver was first written in Phase 9. I would love to claim this was caught by some clever fuzzer; in reality the GUI installer’s test harness sent \x1b over QEMU’s sendkey, watched gui-installer not respond, and I spent half an hour adding diagnostic prints in the wrong layer before finally instrumenting the kbd ring.
chronos couldn’t open a socket
Every boot, in the serial log:
chronos: socket failed
chronos is the SNTP daemon — it’s supposed to walk up to time.google.com once an hour and clock_settime the kernel wall clock. It hasn’t been working since Phase 46c, the cap-policy redesign. That phase moved capability grants from a hardcoded table into per-binary files in /etc/aegis/caps.d/. Every binary in the system has one — except chronos. The file was never written, so chronos started with only baseline caps, and socket(AF_INET, SOCK_DGRAM, 0) returned EPERM.
Two-line file fixes the daemon:
service NET_SOCKET
Bonus latent issue exposed by the same investigation: $(ROOTFS) in the Makefile depended on the binary manifest but not on the rootfs/ skeleton tree, so the cap policy I just added didn’t actually get into the image until I rm -fed the rootfs by hand. Added find rootfs -type f as a prereq.
Lumen was crashing on certain glyphs
stb_truetype.h:3155 STBTT_assert(dy >= 0). A floating-point edge case in stb’s rasterizer where dy ends up at -0.0 or some other epsilon-negative value after the swap-and-negate. Hits a real assert(0) → abort() → SIGABRT. Lumen dies, the entire compositor goes with it, the user sees a black screen.
I didn’t try to fix the rasterizer — it’s third-party code and the assertion is a debug-only sanity check, not a correctness invariant. The worst case of skipping it is one misrendered scanline. So:
#define STBTT_assert(x) ((void)0)
#define STB_TRUETYPE_IMPLEMENTATION
#include "stb_truetype.h"
This was hidden by the test harness for a long time because it only fires on cold-cache cold-CPU runs where Lumen’s startup is slow enough to hit a particular glyph at a particular y-offset. With the new test infra (see below), it shows up reliably enough to be worth fixing.
libauth race
Bastion’s auth_check returned -1 on the first call in roughly half of cold boots, then succeeded on retry. It calls auth_lookup_passwd (/etc/passwd), then auth_lookup_shadow (/etc/shadow), then crypt. None of those should be racy. They are.
I didn’t track it down — possibly an ext2 readahead/cache interaction during early boot, possibly something in musl’s crypt that doesn’t like being called before some libc init has settled. Mitigated rather than fixed: Bastion’s new autologin path retries do_auth up to five times with a 200 ms sleep before falling through. Production login (where the user types their password) is on a slower interactive path and was never observably flaky, so it’s untouched. There’s a TODO with my name on it.
← and → in the GUI installer
Real production feature, not a test hack: arrow keys now navigate the wizard. Left = back, right = next. Esc still works (now that Esc exists), Enter still advances. Anyone who finds keyboard navigation easier than mousing through the install will get it.
This shook out two structural gaps in the external window protocol:
- CSI sequences (
ESC [ A/B/C/D) weren’t reaching proxy windows. Lumen’s CSI handler only forwarded the bytes to PTY-backed windows (tag >= 0) — for proxy windows the whole sequence was silently dropped. Fixed: when the focused window is a proxy, translate the four arrow CSIs into single-byte synthetic codes (0xF1–0xF4) and dispatch viaon_key. ASCII / UTF-8 don’t use that range so it’s a clean side channel until the protocol grows a real “raw key” event. - gui-installer had no arrow handler. It does now; new
KEY_ARROW_LEFT/RIGHTmacros, both routed through the existinghandle_backandhandle_key('\r')paths so screen-state machines don’t need to learn about them.
Ctrl+Alt+I from anywhere
LUMEN_OP_INVOKE already let citadel-dock ask Lumen to spawn "terminal" or "widgets". Added "gui-installer" as a third target — and bound Ctrl+Alt+I in Lumen to fire it, parallel to Ctrl+Alt+T’s dropdown terminal. Hit the chord from the desktop, get a fresh installer window, no shell required. Once Aegis grows past one-laptop-at-a-time deployments, having “summon installer” on a hotkey is going to feel less weird than it does today.
The shape of the dispatch is the same as Ctrl+Alt+T: kernel produces ESC + 0x09, lumen’s escape-sequence handler matches eb == 0x09 and calls invoke_handler(comp, "gui-installer"). New spawn_external_client() helper wraps sys_spawn with stderr plumbed back to /dev/console, so the spawned client’s dprintf(2, ...) lines reach the serial log alongside Lumen’s.
stsh learned &
$ lumen &
$
Background jobs were never supported — & was just a literal argument to whatever command preceded it. Added parse_pipeline_bg and run_pipeline_bg: the parser strips a trailing & and sets a flag, the runner forks the pipeline and returns immediately without waitpid. SIGCHLD = SIG_IGN in the shell means the kernel reaps zombies for us, so no janitor logic.
I don’t actually use this in any test anymore — the new test ISO autologins instead of driving stsh — but it’s a real shell feature people will reach for and it’s now there.
Test harness: stop being slow and flaky
The 1.0.1 GUI installer test passed maybe 60% of the time on cold cache and took 25–30 seconds when it worked. 1.0.2 has a new test ISO aegis-installer-test.iso (built via make installer-test-iso) that bakes bastion_autologin=root into the multiboot2 cmdline. Bastion sees the flag, skips the greeter form entirely, and authenticates with the hardcoded test password. Production builds never set the flag, so the test password isn’t exposed in a real release.
Result: the back-button test now passes 10/10 in 6 seconds. The full install-and-boot-from-NVMe test passes 5/5 in 22 seconds. Total wall time for both: under 3 minutes.
Side fixes that came out of chasing the flake:
- vortex’s QEMU child wasn’t being killed when a test panicked.
kill_on_drop(true)on the spawn so leaked processes don’t accumulate and race the next test for the monitor socket. - vortex’s QEMU stderr was
Stdio::piped()but never read. ~64K of stderr would fill the pipe and block QEMU mid-boot. Switched toStdio::null(). - vortex got a
VORTEX_SERIAL_TEEenv hook that appends every raw serial line to a file. Indispensable for catching the two latent crashes above —wait_for_lineconsumes lines as it goes, so when the test panics on a missing marker you’ve usually already lost the line that explained why.
Get it
Download v1.0.2 ISO — same shape as 1.0.1 (live boot, graphical default, text-mode entry in GRUB). Default credentials are still root / forevervigilant.
Issues to exec/aegis, security to execxd@icloud.com.
Forever vigilant.