Roadmap
Development roadmap for Aegis OS — near-term milestones, medium-term goals, and long-term vision
Roadmap
Aegis is v1 software. On x86_64 the kernel boots, runs a full userspace, enforces capabilities at every syscall boundary, and serves network traffic. The filesystem is writable on NVMe, signals and pipes work, musl-linked utilities run against a fairly complete syscall surface, and application processors come up cleanly. But significant work remains before Aegis can credibly claim production-grade security or broad hardware support. This page describes what comes next, roughly ordered by priority and dependency.
Contributions are welcome at any stage. File issues or propose changes at exec/aegis.
Released Versions
| Version | Date | Highlights |
|---|---|---|
| v1.0.0 | Apr 12 | Initial public release on x86_64 |
| v1.0.1 | Apr 16 | IPC/AF_UNIX/memfd hardening; Lumen freeze fixes; GUI installer wiring |
| v1.0.2 | Apr 17 | PS/2 Esc keypress fix; gui-installer arrow nav; Ctrl+Alt+I; chronos NET_SOCKET cap; deflaked installer test 0/15 → 15/15 |
| v1.0.3 | Apr 18 | 20 new coreutils (head/tail/find/test/[/env/sleep/…); kernel envp propagation in execve; sys_sethostname (170); installed-disk GRUB cfg fix; installer warns about overwriting existing Aegis; SIMPLE_USER_PROGS Makefile dep fix; stsh strerror diagnostic |
Open issues tracked for v1.0.4
- Kernel ext2 lookup ENOENT after ~11 sequential ext2-backed execve(2)s — surfaced during 1.0.3 testing. Initrd-served binaries (cat/ls/echo/sh/login/vigil) are unaffected because they never go through ext2; new ext2-only coreutils trip the bug after enough exec’d from one boot. Suspected cause: 16-slot LRU block cache evicting indirect blocks, cache-miss path returning ENOENT instead of refilling. 9 of the 20 1.0.3 coreutils ship working but un-asserted in CI because of this.
/bin/expandexecve mystery — single-binary failure that fails to load even from a fresh boot, while neighbouring binaries built with the same toolchain load fine. Probably a kernel ELF loader edge case.- Bastion libauth cold-boot race —
do_authreturns -1 on the first call in roughly half of cold boots; mitigated since 1.0.2 by retrying up to five times with a 200 ms sleep, but not root-caused. - Keyboard regression on installed UEFI systems — covered at length in the 1.0.3 release notes. Suspect xHCI USB-HID enumeration timing under UEFI without legacy PS/2 emulation. Reproduces on real hardware, not in QEMU.
Near-Term: ARM64 Port
The ARM64 port reached working userspace on qemu-system-aarch64 -machine virt on 2026-04-12, roughly a week after the v1.0.0 x86_64 release. Development is on the arm64-port branch; merge to master is pending real-hardware verification on Raspberry Pi 5. See ARM64.md and PI5.md in the repository for the full port plan and Pi 5 flashing guide.
What works today on QEMU virt
A single aegis-arm64.elf boots end-to-end on qemu-system-aarch64 -machine virt with either -cpu cortex-a72 -machine virt (GICv2, Pi 4 profile) or -cpu cortex-a76 -machine virt,gic-version=3 (GICv3, Pi 5 profile) through to an interactive aegis@aegis:/# stsh shell:
- Boot path:
boot.Sproduces a valid Linux arm64 Image with a 64-byte header, detects EL2 entry, drops to EL1 with full register sanitization (SCTLR_EL2,HCR_EL2,CNTHCTL_EL2,CNTVOFF_EL2,CPTR_EL2,HSTR_EL2,VTTBR_EL2,VPIDR_EL2,VMPIDR_EL2,ICC_SRE_EL2), invalidates I-cache and TLB before MMU enable, and builds inline TTBR0/TTBR1 page tables covering up to 8 GB of physical memory. - MMU: 4KB granule, 4-level paging, 48-bit virtual addresses, 40-bit physical addresses (required for Pi 5’s peripheral region above 4 GB). Kernel higher-half at
0xFFFF000000000000. Block mappings at L1 for the first eight 1 GB windows with DEVICE attributes on the peripheral blocks and NORMAL cacheable on RAM blocks. - Exception vectors: full 16-entry vector table in
vectors.Swith EL0/EL1 sync/IRQ paths, SPSR sanitization against privilege-escalation via crafted signal frames, and per-EL handlers that save/restoreELR_EL1/SPSR_EL1correctly across nested preemption (fixing a subtle hazard where kernel-mode IRQs during long syscalls would clobber return state). - Context switch:
ctx_switch.Ssaves/restores callee-saved x19-x30 + fp/lr + SIMD via shared scheduler code;proc_enter.Shandles the EL1→EL0 trampoline for both initial spawn and fork-return paths. - Syscall entry:
SVC #0dispatches through_exc_sync_el0, pulls the syscall number from x8, and reuses the sharedsyscall_dispatchtable from x86. A small ARM64→x86 translation table inkernel/syscall/syscall.cmaps Linux-aarch64 syscall numbers to Aegis’s x86 numbering, so musl aarch64 binaries Just Work. - Interrupt controller: runtime-dispatched GICv2 (
gic.c) and GICv3 (gic_v3.c) drivers. Version selection happens at boot based on the DTBcompatiblestring (arm,gic-400/arm,cortex-a15-gic→ v2;arm,gic-v3/arm,gic-600→ v3), or a CPU-feature-register fallback (ID_AA64PFR0_EL1.GIC) when no DTB is available. The v3 driver configures the distributor withARE_NS|Group1NS, wakes the per-CPU redistributor, and usesICC_*_EL1system registers for ack/EOI. Every spinloop has a bounded iteration count and emits a[GICv3] TIMEOUTdiagnostic on failure, so real-silicon hangs produce actionable error messages instead of silence. - Timer: ARM generic timer at 100 Hz via
CNTP_TVAL_EL0+CNTP_CTL_EL0, routed through the GIC as PPI 30. - DTB walker:
arch_mm.cincludes a recursive cell-tracking device-tree parser with per-node#address-cells/#size-cells+rangespropagation. Correctly handles both QEMU virt’s flat layout and Pi 5’s nested/soc@107c000000/hierarchy where child nodes use SoC-local addresses that must be translated through the parent’srangeswindow. Parses/memory,/intc(wherever it lives in the tree), and/reserved-memory— the last captures TF-A’s0x0-0x80000reservation on Pi 5 so the PMM doesn’t hand those pages out. - PL011 UART: runtime-probed across both QEMU virt (
0x09000000) and Pi 5 BCM2712 (0x107D001000) physical bases by readingUARTPeriphID0at offset0xFE0. Firmware-initialized on both platforms; the kernel just latches onto the live base. Plus a pre-MMU “I’m alive” diagnostic directly fromboot.Sthat writes to both candidate addresses before any C code runs, using a temporary exception vector table so faulting probes don’t hang the CPU. - Rust capability core:
kernel/cap/cross-compiles cleanly toaarch64-unknown-nonewithout a single line of source changes. The existingno_stdcrate just needs the target added torust-toolchain.tomland the Makefile wired up. - Userspace: real aarch64-musl-built vigil + stsh + login + echo/cat/ls embedded via the same
objcopyblob pipeline the x86 build uses. Login authenticates against/etc/shadowwith libauth’s SHA-512 crypt, spawns stsh, and the user gets a working interactive shell. - Testing:
tests/tests/boot_oracle_arm64.rsruns the arm64 kernel under QEMU via Vortex’sQemuBackend(VORTEX_QEMU_BINARY=qemu-system-aarch64), asserts the 13-line boot subsequence, and has been wired into the GitHub Actions CI matrix alongside the x86 job.
What’s next for ARM64
- Real-silicon verification on Raspberry Pi 5 — the
build/pi5-image/directory produces a ready-to-flash FAT32 layout (DTB + config.txt +kernel_2712.img, ~1 MB total), andtools/build-pi5-image.shfetches the Pi firmware blobs from the upstream raspberrypi/firmware repo. First silicon boot is pending the arrival of a USB-TTL serial cable wired to the Pi 5’s dedicated JST-SH debug header (GPIO 14/15 is behind the RP1 south bridge and requires an RP1 driver that doesn’t exist yet). Once verified, the branch merges tomaster. - RP1 south bridge driver — Pi 5 houses its USB, Ethernet, GPIO, and additional UARTs behind a custom PCIe-attached chip. Bring-up requires a PCIe host controller driver for the BCM2712 internal root complex plus RP1 register-level knowledge. Until this lands, networking and USB on Pi 5 are unavailable; everything flows through the JST-SH debug console. Circle’s
lib/southbridge.cppis the reference implementation. - SMP via PSCI — Pi 5 and QEMU virt both expose PSCI at EL3 (
CPU_ONfunction ID0xC4000003viasmc #0). Bringing up the remaining three cores is straightforward in principle; the catch on Pi 5 is that TF-A already holds the secondaries in a mailbox spin loop, soCPU_ONreturnsPSCI_E_ALREADY_ONand the kernel must use TF-A’s mailbox entry-point mechanism instead. Deferred until single-core is verified on silicon. - IOMMU / SMMU — Aegis’s x86 roadmap item for DMA isolation applies equally on ARM. SMMUv3 on Pi 5 is exposed to non-secure software in the DTB; programming it is a prerequisite for safely running RP1 drivers with DMA.
- Full
bcm2712d0.dtbosupport — newer Pi 5 2 GB / 16 GB boards with BCM2712 D0 stepping need a DT overlay for correct peripheral initialization. The packager ships the overlay conditionally; the kernel has to parse applied-overlay nodes from the DTB rather than the base DTS.
Near-Term: Kernel Feature Completion
These items are the remaining gaps before the security model is fully demonstrable on x86_64. They are largely independent of each other and can be worked on in parallel.
TCP Polish
The network stack is implemented end-to-end (TCP, UDP, IP, ARP, DHCP client, poll/select/epoll, SO_REUSEADDR, SO_REUSEPORT) and carries real traffic — the in-tree httpd serves requests to the host, and curl -sk https://example.com works from the shell against real internet endpoints. The earlier tcp_send_segment / tcp_lock race was resolved before 1.0.0 ships; the remaining work is around send segmentation, per-connection TX buffering, and proper flow-control window accounting (Phase 49). These are required for a real SSH server (Phase 50) to stay healthy under load.
sys_setitimer for interval connection timeouts is also still missing — most server code today relies on poll timeouts instead. POSIX timers (setitimer / alarm / timerfd_*) are queued as Phase 51.
Milestone: A sustained HTTP load test against Aegis httpd runs for 24 hours with zero hung connections or dropped retransmits.
Timer Infrastructure
Replace PIT-only timing with HPET or TSC-based high-resolution timers. clock_gettime already exists but runs at PIT resolution; a better clock source would enable nanosecond precision, per-process CPU time accounting, and tighter scheduling jitter.
TinySSH
A minimal SSH server for remote access. Statically linked, capability-gated (CAP_NET for the listener, CAP_AUTH for session authentication). This is the first real test of the capability model under adversarial conditions — an internet-facing service that must be confined.
Security Audit
At feature completion, the kernel will be roughly 10-15K lines of C plus the Rust capability core. Small enough for a serious line-by-line audit of every syscall path, every capability check, every pointer validation. This is the gate before anything runs in a non-experimental environment.
Medium-Term Goals
Rust Migration
The capability validation core (kernel/cap/src/lib.rs) is the first kernel subsystem written in Rust, compiled as a #![no_std] staticlib and linked via C FFI. This is the template for a gradual, subsystem-by-subsystem migration.
Planned migration order:
- Capability system — done.
cap_init,cap_grant,cap_checkare Rust. - Syscall dispatch — the syscall table and argument validation layer. This is the primary attack surface for userspace-to-kernel exploitation. Rust’s bounds checking and type safety would eliminate entire classes of bugs here.
- Drivers — DMA descriptor processing, MMIO register access, ring buffer management. The security considerations section of the driver documentation lists the specific risks: no IOMMU, unchecked descriptor lengths, trusted MMIO reads. Rust safe abstractions for MMIO and DMA would address these structurally.
- Memory management — the VMM and VMA subsystems. These are the most complex and the last to migrate, because they are deeply intertwined with architecture-specific assembly.
Each subsystem follows the same pattern: #![no_std] crate, extern "C" FFI boundary, staticlib linked into the kernel. No Rust types leak into C. The C-to-Rust boundary is explicit and auditable.
virtio-gpu Support
Paravirtualized GPU for QEMU guests. Enables hardware-accelerated 2D rendering for the Lumen compositor without requiring a real GPU driver. The virtio-gpu protocol supports scanout configuration, 2D resource creation, and transfer-to-host operations — enough for a composited desktop without 3D.
AMD iGPU Driver (RDNA2)
Scoped to modesetting and SDMA 2D blits — not a full AMDGPU driver. Enough to set a display mode and accelerate framebuffer operations on AMD APUs (Ryzen 6000+). The full AMDGPU register space is enormous; this targets the minimum viable subset for a server or kiosk display.
Intel igb NIC Driver
The igb family covers most modern Intel server and desktop NICs (I210, I211, I350, I354). Adding igb alongside the existing RTL8169 driver gives Aegis viable network support on the majority of x86 hardware. The driver model is similar to RTL8169: descriptor rings, MMIO registers, polling-mode operation.
Apple Virtualization Framework Support
QEMU is the current development and testing platform, but Apple’s Virtualization.framework provides native hypervisor support on Apple Silicon Macs with significantly better performance. With the ARM64 port now booting on QEMU virt (see above), the remaining work is virtio-mmio transport support (Virtualization.framework exposes virtio devices over MMIO, not PCI) and adapting the boot entry contract to Virtualization.framework’s DTB handoff. The Vortex VM management tool already plans for this integration path, and qemu-system-aarch64 -accel hvf on Apple Silicon already provides a native-speed development loop for the ARM64 kernel in the interim.
SMP Scheduling Maturity
Application processors are brought up via INIT-SIPI-SIPI and enter an idle loop with per-CPU GDT/TSS/LAPIC state, so multi-core boot works today. What remains is turning that foothold into a real multi-core scheduler:
- Per-CPU run queues with work-stealing for load balancing (the scheduler currently runs workloads primarily on the BSP)
- Fine-grained locking on shared kernel state (PMM bitmap, VMM, global task table) — the current spinlocks are coarse and were written for a uniprocessor invariant
- IPI-driven TLB shootdown on
vmm_unmapand cross-CPU preemption wakeup - Per-CPU LAPIC timer replacing the shared PIT tick
- KPTI (kernel page-table isolation) — separate user/kernel PML4 per CPU to mitigate Meltdown-class side-channel attacks. Capabilities do not address microarchitectural information leaks; KPTI does.
A lock audit of all global kernel state is a prerequisite for any of this landing safely.
Milestone: -smp 4 in QEMU boots all cores, the scheduler distributes workers across them, and HTTP load testing shows throughput scaling with core count.
Long-Term Vision
Self-Hosting
Aegis should be able to build itself. This requires a working C compiler (likely a port of a small C compiler like cproc or chibicc), an assembler, a linker, and make. Self-hosting is a strong correctness signal — it exercises the filesystem, process model, memory management, and syscall interface under sustained, complex workloads.
Package Management
A minimal package manager for installing, updating, and removing statically-linked binaries. Packages carry capability policy metadata — installing httpd also installs its /etc/aegis/caps.d/httpd policy file. No dynamic dependency resolution (all binaries are statically linked against musl).
AMD iGPU Acceleration (Full Path)
The medium-term AMD iGPU entry targets modesetting and SDMA 2D blits — enough to drive a display and accelerate memcpy-class framebuffer operations. The long-term goal is the rest of the stack: GFX ring submission, shader compilation, command buffer management, and enough of the Display Core Next (DCN) interface to drive the compositor through the GPU instead of writing pixels from the CPU.
Scoping this honestly: a full AMDGPU-equivalent driver is ~500K lines in Linux and tracks a moving firmware ABI. Aegis will not attempt feature parity. The target is enough acceleration that Lumen’s compositing pipeline — dirty-rect blits, frosted glass, window scaling, cursor overlay — runs on the GPU instead of the CPU, with the CPU-path software rasterizer remaining as a fallback. Vulkan, OpenGL, and compute workloads are out of scope.
The dependency chain: IOMMU support first (DMA attacks are unacceptable for a shared-memory driver), then firmware loading (PSP, SMU, SDMA blobs), then ring submission plumbing, then a narrow command stream for 2D composition. Each layer is independently testable via PCIe passthrough on real hardware.
Broader Hardware Support
- IOMMU programming (VT-d / AMD-Vi) for DMA isolation — the single most impactful security improvement for the driver layer, and a prerequisite for the AMD iGPU work above
- AHCI/SATA for older storage hardware
- More NIC families as needed
- Real GPU drivers beyond the scoped AMD iGPU target
- Audio (likely virtio-sound first, then Intel HDA)
Real Hardware Coverage
Aegis already boots on real x86_64 hardware — the current reference machine is a ThinkPad X13 Gen 1 (Ryzen 7 4750U), where the kernel comes up cleanly through ACPI, brings up NVMe storage, and runs the full userspace including Lumen and the desktop shell. The remaining gap on that machine is networking: it has no built-in Ethernet, and Aegis does not yet have a USB-C / USB Ethernet driver. USB NIC support (likely starting with the ASIX AX88179 or RTL8153 families that ship in most USB-C dongles) is the next concrete real-hardware milestone.
Beyond that, the long tail is the usual: more NIC families, more storage controllers, ACPI quirks for machines that differ meaningfully from QEMU’s emulated platform, and eventually GPU initialization on hardware where UEFI GOP isn’t enough.
This roadmap reflects the state of development as of April 2026. Priorities may shift as work progresses. The best way to influence direction is to contribute — exec/aegis.