Boot Process

Aegis boots via the Multiboot2 protocol. GRUB loads the kernel ELF image at physical address 0x100000, hands off control in 32-bit protected mode, and Aegis transitions through PAE long mode into a higher-half kernel mapped at 0xFFFFFFFF80000000.

This page documents the complete boot path from the first instruction after GRUB to the scheduler’s first context switch.

v1 maturity notice. Aegis v1 is the first version deemed ready for public release – it is not a mature or production-hardened system. The boot sequence initializes security features like SMAP, SMEP, and the capability model, but these are v1 implementations in a predominantly C codebase. There are likely exploitable vulnerabilities throughout the kernel, as would be expected with any from-scratch OS at this stage. The kernel is undergoing a gradual translation from C to Rust; kernel/cap/ (the capability system) is already written in Rust and represents the beginning of this migration path. Contributions are welcome – file issues or propose changes at exec/aegis.

Linker Layout

The linker script (tools/linker.ld) defines two address regimes:

OUTPUT_FORMAT("elf64-x86-64")
ENTRY(_start)

PHYS_BASE = 0x100000;
KERN_VMA  = 0xFFFFFFFF80000000;

SECTIONS {
    . = PHYS_BASE;
    .multiboot : { KEEP(*(.multiboot)) }     /* VMA = LMA = physical */
    .text.boot : { *(.text.boot) }           /* VMA = LMA = physical */
    . += KERN_VMA;
    .text   : AT(ADDR(.text)   - KERN_VMA) { ... }
    .rodata : AT(ADDR(.rodata) - KERN_VMA) { ... }
    .data   : AT(ADDR(.data)   - KERN_VMA) { ... }
    .bss    : AT(ADDR(.bss)    - KERN_VMA) { ... }
    _kernel_end = .;
}

Key properties:

Section VMA LMA Purpose
.multiboot Physical Physical Multiboot2 header (must be within first 8KB)
.text.boot Physical Physical 32-bit entry, GDT, physical trampoline
.text KERN_VMA + offset Physical 64-bit kernel code (higher-half)
.rodata/.data/.bss KERN_VMA + offset Physical Kernel data (higher-half)

The .multiboot and .text.boot sections have VMA = LMA so that code executing before paging can reference labels directly. All other sections have higher-half VMAs with physical LMAs via the AT() directive.

Multiboot2 Header

The multiboot2 header (boot.asm, section .multiboot) is placed within the first 8KB of the binary:

multiboot_header_start:
    dd MULTIBOOT2_MAGIC          ; 0xE85250D6
    dd MULTIBOOT2_ARCH           ; 0 = i386 (32-bit protected mode entry)
    dd (multiboot_header_end - multiboot_header_start)
    dd -(MULTIBOOT2_MAGIC + MULTIBOOT2_ARCH + ...)  ; checksum

    ; Framebuffer request tag (type=5)
    dw 5                         ; MULTIBOOT_HEADER_TAG_FRAMEBUFFER
    dw 0                         ; required (not optional)
    dd 20                        ; tag size
    dd 0                         ; width = any (native resolution)
    dd 0                         ; height = any
    dd 32                        ; depth = 32bpp required

    ; End tag
    dw 0
    dw 0
    dd 8
multiboot_header_end:

The framebuffer tag requests a 32-bpp linear framebuffer at the native resolution. GRUB honors this together with gfxpayload=keep in grub.cfg.

Stage 1: 32-bit Entry (_start)

GRUB transfers control to _start in 32-bit protected mode with:

Register Value Purpose
EAX 0x36D76289 Multiboot2 magic
EBX Physical address Pointer to multiboot2 info structure
CPU Protected mode Interrupts disabled, paging off

Preserving Bootloader Arguments

The first two instructions save the multiboot2 arguments into the SysV AMD64 ABI parameter registers. These registers survive the entire mode transition untouched:

mov edi, eax    ; mb_magic  -> RDI (first arg)
mov esi, ebx    ; mb_info   -> RSI (second arg)

Building Page Tables

Five page tables are allocated from .bss (zeroed by GRUB at load time). Since .bss labels have higher-half VMAs, the code uses (label - KERN_VMA) to compute physical addresses at assemble time:

PML4 Table (pml4_table)
  [0]   -> pdpt_lo   (identity map)
  [511] -> pdpt_hi   (higher-half kernel)

pdpt_lo
  [0] -> pd_lo

pdpt_hi
  [510] -> pd_hi     (PDPT_HI_IDX = (KERN_VMA >> 30) & 0x1FF = 510)

pd_lo
  [0..511] -> 512 x 2MB huge pages (identity: PA 0..1GB)

pd_hi
  [0..3]   -> 4 x 2MB huge pages (kernel: PA 0..8MB)

The identity map covers the first 1GB via 512 huge pages in pd_lo. The higher-half map covers 8MB (four 2MB pages in pd_hi), sufficient for the kernel binary, BSS, and any GRUB-placed multiboot2 info.

Enabling Long Mode

The mode transition follows the Intel-prescribed sequence:

1. CR4.PAE = 1              Enable Physical Address Extension
2. CR3 = pml4_table (phys)  Load page table root
3. EFER.LME = 1             Enable Long Mode
   EFER.NXE = 1             Enable No-Execute page support
4. GDT loaded via lgdt       64-bit code/data descriptors
5. CR0.PG = 1               Enable paging (activates long mode)
   CR0.WP = 1               Write-protect for ring-0
6. Far jump 0x08:long_mode_phys  Reload CS, enter 64-bit mode

The GDT used during boot lives in .text.boot (physical VMA) so lgdt works before paging is active:

Index Selector Description
0 0x00 Null descriptor
1 0x08 64-bit code: P=1, DPL=0, L=1, Type=code/execute/read
2 0x10 64-bit data: P=1, DPL=0, Type=data/read/write

Stage 2: Physical Trampoline (long_mode_phys)

After the far jump, the CPU executes in 64-bit mode but still at a physical address (within .text.boot). This stub:

  1. Sets all data segment registers (DS, ES, FS, GS, SS) to selector 0x10
  2. Loads the full 64-bit higher-half address of long_mode_high into RAX
  3. Jumps through RAX to cross the VMA gap
mov rax, long_mode_high    ; 64-bit higher-half VMA
jmp rax                    ; cross the VMA gap

This indirection is necessary because RIP-relative addressing cannot reach .text symbols from the physical address in .text.boot.

Stage 3: Higher-Half Entry (long_mode_high)

Now executing at 0xFFFFFFFF80xxxxxx, the kernel sets up the boot stack and calls C:

mov rsp, boot_stack_top    ; 16KB stack in .bss (higher-half VMA)
xor rbp, rbp               ; terminate stack unwinding
call kernel_main            ; kernel_main(mb_magic, mb_info)
.halt:
    hlt
    jmp .halt               ; unreachable

The boot stack is 16KB (16384 bytes), 16-byte aligned per the SysV AMD64 ABI requirement.

Stage 4: kernel_main Initialization Sequence

kernel_main (kernel/core/main.c) orchestrates all subsystem initialization in a carefully ordered sequence. Each subsystem prints a [TAG] OK line to serial and VGA on success.

Early Hardware

Order Function Log Tag Purpose
1 arch_init() (none) serial_init() + vga_init()
2 arch_pat_init() (none) Program PAT MSR: PA1=WC for framebuffer
3 arch_mm_init(mb_info) [CMDLINE] Parse multiboot2 memory map, framebuffer, ACPI RSDP, modules, cmdline

arch_mm_init walks the multiboot2 tag stream (still accessible via the identity map) and populates:

  • Usable RAM regions (type=1 entries) for the PMM
  • Reserved regions (first 1MB, multiboot2 info, GRUB modules)
  • ACPI RSDP physical address (v2 preferred, v1 fallback)
  • Framebuffer info (addr, pitch, width, height, bpp)
  • Kernel command line (boot=text, quiet)

Memory Management

Order Function Log Tag Purpose
4 pmm_init() [PMM] Bitmap allocator: mark all reserved, free usable, re-reserve platform ranges + kernel image
5 vmm_init() [VMM] Build 5 new page tables (identity + higher-half), install mapped-window allocator, load CR3
6 kva_init() [KVA] Kernel virtual bump allocator starting at KERN_VMA + 0x800000
7 arch_set_master_pml4() (none) Store master PML4 phys for ISR/SYSCALL CR3 switching

Framebuffer & Display

Order Function Log Tag Purpose
8 fb_init() [FB] Map linear framebuffer via KVA with WC caching
9 fb_boot_splash() (none) Draw boot logo (graphical mode only)

Security & CPU Setup

The security subsystems initialized in this phase represent v1 implementations. While SMAP, SMEP, and the capability table provide meaningful protection layers, this is a from-scratch C kernel and should not be assumed to be free of exploitable vulnerabilities. The capability model (kernel/cap/) is notably the first kernel subsystem written in Rust, beginning a planned gradual migration of the kernel from C to Rust.

Order Function Log Tag Purpose
10 cap_init() [CAP] Initialize per-process capability table infrastructure (Rust)
11 smp_percpu_init_bsp() [SMP] Initialize per-CPU data for BSP
12 idt_init() [IDT] Install 256 interrupt gates, load IDTR
13 pic_init() [PIC] Remap 8259A: IRQ0-15 to vectors 0x20-0x2F, mask all
14 pit_init() [PIT] Program PIT channel 0 at 100 Hz, unmask IRQ0
15 kbd_init() [KBD] PS/2 keyboard, unmask IRQ1
16 ps2_mouse_init() [MOUSE] PS/2 mouse, unmask IRQ12
17 arch_gdt_init() [GDT] Runtime 7-entry GDT with ring-3 descriptors + TSS
18 arch_tss_init() [TSS] TSS RSP0 for ring-3 to ring-0 transitions
19 arch_syscall_init() [SYSCALL] Program STAR/LSTAR/SFMASK MSRs for SYSCALL/SYSRET
20 arch_smap_init() [SMAP] Supervisor Mode Access Prevention (CR4.SMAP)
21 arch_smep_init() [SMEP] Supervisor Mode Execution Prevention (CR4.SMEP)
22 arch_sse_init() (none) Enable SSE/SSE2 for user-mode processes
23 random_init() [RNG] ChaCha20 CSPRNG seeded from RDTSC

Storage & Filesystems

Order Function Log Tag Purpose
24 ramdisk_init() (none) Map GRUB module 1 (rootfs) as ramdisk0 block device
25 ramdisk_init2() (none) Map GRUB module 2 (ESP image) as ramdisk1
26 vfs_init() [VFS] Virtual filesystem + initrd mount
27 console_init() (none) Register stdout device

ACPI & APIC

Order Function Log Tag Purpose
28 acpi_init() [ACPI] Parse MCFG (PCIe config) + MADT (APIC topology)
29 lapic_init() [LAPIC] Local APIC initialization
30 ioapic_init() [IOAPIC] I/O APIC initialization
31 i8042 flush (none) Drain stale scancodes from BIOS/GRUB keyboard buffer

PCI & Block Devices

Order Function Log Tag Purpose
32 pcie_init() [PCIE] Enumerate PCIe devices via ECAM
33 nvme_init() [NVME] NVMe block device
34 gpt_scan("nvme0") [GPT] GPT partition table scan
35 ext2_mount() (none) Mount ext2 root (ramdisk0 preferred, nvme0p1 fallback)
36 cap_policy_load() (none) Load policy capabilities from /etc/aegis/caps.d/

Network & USB

Order Function Log Tag Purpose
37 xhci_init() [XHCI] xHCI USB host controller
38 virtio_net_init() [NET] Virtio-net NIC driver
39 rtl8169_init() [NET] RTL8168/8169 NIC driver
40 net_init() (none) Protocol stack + ICMP self-test

SMP & Scheduler

Order Function Log Tag Purpose
41 smp_start_aps() [SMP] Wake Application Processors via INIT-SIPI-SIPI
42 sched_init() (none) Initialize run queue
43 sched_spawn(task_idle) (none) Create idle task (task 0)
44 proc_spawn_init() (none) Spawn /sbin/init in ring 3
45 vmm_teardown_identity() [VMM] Clear PML4[0], reload CR3 (identity map removed)
46 fb_boot_splash_end() (none) Clear splash screen, unlock framebuffer
47 sched_start() [SCHED] Context-switch into first task (never returns)

After sched_start(), the idle task enables the LAPIC timer and enters the halt loop. The scheduler preemptively multitasks between the idle task and the init process.

Boot Modes

The kernel command line controls the boot experience:

Cmdline Behavior
boot=text Text console, no splash, all printk output visible on framebuffer
boot=graphical quiet Boot splash displayed, printk suppressed on framebuffer (serial only)
(none) Default graphical boot with printk visible

Memory Layout at Boot

Physical Memory:
+------------------+ 0x000000
|  BIOS/VGA hole   |  Reserved (first 1MB)
+------------------+ 0x100000
|  Kernel image    |  .multiboot, .text.boot, .text, .rodata, .data
+------------------+
|  Kernel BSS      |  Page tables, pmm_bitmap (128KB), boot stack (16KB)
+------------------+ _kernel_end (physical)
|  Free RAM        |  Managed by PMM
+------------------+

Virtual Memory (after vmm_init):
+------------------+ 0x0000000000000000
|  Identity map    |  [0..1GB) -> PA [0..1GB)  (removed at step 45)
+------------------+
     ... gap ...
+------------------+ 0xFFFFFFFF80000000  (KERN_VMA)
|  Kernel image    |  .text, .rodata, .data (2MB huge pages 0-1)
+------------------+ 0xFFFFFFFF80400000
|  Kernel BSS      |  2MB huge page 2
+------------------+ 0xFFFFFFFF80600000  (VMM_WINDOW_VA)
|  Mapped window   |  2 x 4KB PTE slots for page table manipulation
+------------------+ 0xFFFFFFFF80800000  (KVA_BASE)
|  KVA region      |  Bump-allocated kernel objects (TCBs, stacks, etc.)
+------------------+

AP Bootstrap (SMP)

Application Processors follow a separate boot path. The BSP copies the AP trampoline (ap_trampoline.asm) to physical address 0x8000 and sends INIT-SIPI-SIPI sequences.

Each AP wakes in 16-bit real mode at CS=0x0800, IP=0x0000 (linear 0x8000) and transitions through:

  1. Real mode – Enable A20 line, load temporary GDT, set CR0.PE
  2. 32-bit protected mode – Enable PAE, load CR3 (shared kernel PML4), enable long mode + paging
  3. 64-bit long mode – Set segments, read LAPIC ID via CPUID, pick per-CPU stack from table
  4. Jump to ap_entry() – Higher-half C entry point for per-CPU initialization (GDT, IDT, LAPIC)

The AP trampoline embeds its own 5-entry GDT (null, 32-bit code, 32-bit data, 64-bit code, 64-bit data) and a data area filled by the BSP before SIPI:

Field Size Purpose
ap_pml4 4 bytes Physical address of kernel PML4
ap_entry_addr 8 bytes 64-bit VA of ap_entry()
ap_stacks 256 x 8 bytes Per-CPU kernel stack tops (indexed by LAPIC ID)

Runtime GDT Layout

After arch_gdt_init() replaces the boot GDT, the runtime GDT has 7 entries:

Index Selector RPL=3 Description
0 0x00 Null
1 0x08 Kernel code (DPL=0, L=1)
2 0x10 Kernel data (DPL=0)
3 0x18 0x1B User data (DPL=3) – must precede user code
4 0x20 0x23 User code (DPL=3)
5-6 0x28 TSS (16-byte system descriptor)

The user data/code ordering is critical for SYSRET. STAR MSR [63:48] = 0x10 causes SYSRET to derive:

  • SS = (0x10 + 8) | 3 = 0x1B (GDT[3] = user data)
  • CS = (0x10 + 16) | 3 = 0x23 (GDT[4] = user code)

Each CPU gets its own GDT copy because LTR sets the Busy bit in the TSS descriptor.

See Also