Boot Process
Multiboot2 entry, 32-to-64-bit mode transition, higher-half trampoline, and kernel_main initialization sequence
Boot Process
Aegis boots via the Multiboot2 protocol. GRUB loads the kernel ELF image at physical address 0x100000, hands off control in 32-bit protected mode, and Aegis transitions through PAE long mode into a higher-half kernel mapped at 0xFFFFFFFF80000000.
This page documents the complete boot path from the first instruction after GRUB to the scheduler’s first context switch.
v1 maturity notice. Aegis v1 is the first version deemed ready for public release – it is not a mature or production-hardened system. The boot sequence initializes security features like SMAP, SMEP, and the capability model, but these are v1 implementations in a predominantly C codebase. There are likely exploitable vulnerabilities throughout the kernel, as would be expected with any from-scratch OS at this stage. The kernel is undergoing a gradual translation from C to Rust;
kernel/cap/(the capability system) is already written in Rust and represents the beginning of this migration path. Contributions are welcome – file issues or propose changes at exec/aegis.
Linker Layout
The linker script (tools/linker.ld) defines two address regimes:
OUTPUT_FORMAT("elf64-x86-64")
ENTRY(_start)
PHYS_BASE = 0x100000;
KERN_VMA = 0xFFFFFFFF80000000;
SECTIONS {
. = PHYS_BASE;
.multiboot : { KEEP(*(.multiboot)) } /* VMA = LMA = physical */
.text.boot : { *(.text.boot) } /* VMA = LMA = physical */
. += KERN_VMA;
.text : AT(ADDR(.text) - KERN_VMA) { ... }
.rodata : AT(ADDR(.rodata) - KERN_VMA) { ... }
.data : AT(ADDR(.data) - KERN_VMA) { ... }
.bss : AT(ADDR(.bss) - KERN_VMA) { ... }
_kernel_end = .;
}
Key properties:
| Section | VMA | LMA | Purpose |
|---|---|---|---|
.multiboot |
Physical | Physical | Multiboot2 header (must be within first 8KB) |
.text.boot |
Physical | Physical | 32-bit entry, GDT, physical trampoline |
.text |
KERN_VMA + offset |
Physical | 64-bit kernel code (higher-half) |
.rodata/.data/.bss |
KERN_VMA + offset |
Physical | Kernel data (higher-half) |
The .multiboot and .text.boot sections have VMA = LMA so that code executing before paging can reference labels directly. All other sections have higher-half VMAs with physical LMAs via the AT() directive.
Multiboot2 Header
The multiboot2 header (boot.asm, section .multiboot) is placed within the first 8KB of the binary:
multiboot_header_start:
dd MULTIBOOT2_MAGIC ; 0xE85250D6
dd MULTIBOOT2_ARCH ; 0 = i386 (32-bit protected mode entry)
dd (multiboot_header_end - multiboot_header_start)
dd -(MULTIBOOT2_MAGIC + MULTIBOOT2_ARCH + ...) ; checksum
; Framebuffer request tag (type=5)
dw 5 ; MULTIBOOT_HEADER_TAG_FRAMEBUFFER
dw 0 ; required (not optional)
dd 20 ; tag size
dd 0 ; width = any (native resolution)
dd 0 ; height = any
dd 32 ; depth = 32bpp required
; End tag
dw 0
dw 0
dd 8
multiboot_header_end:
The framebuffer tag requests a 32-bpp linear framebuffer at the native resolution. GRUB honors this together with gfxpayload=keep in grub.cfg.
Stage 1: 32-bit Entry (_start)
GRUB transfers control to _start in 32-bit protected mode with:
| Register | Value | Purpose |
|---|---|---|
EAX |
0x36D76289 |
Multiboot2 magic |
EBX |
Physical address | Pointer to multiboot2 info structure |
| CPU | Protected mode | Interrupts disabled, paging off |
Preserving Bootloader Arguments
The first two instructions save the multiboot2 arguments into the SysV AMD64 ABI parameter registers. These registers survive the entire mode transition untouched:
mov edi, eax ; mb_magic -> RDI (first arg)
mov esi, ebx ; mb_info -> RSI (second arg)
Building Page Tables
Five page tables are allocated from .bss (zeroed by GRUB at load time). Since .bss labels have higher-half VMAs, the code uses (label - KERN_VMA) to compute physical addresses at assemble time:
PML4 Table (pml4_table)
[0] -> pdpt_lo (identity map)
[511] -> pdpt_hi (higher-half kernel)
pdpt_lo
[0] -> pd_lo
pdpt_hi
[510] -> pd_hi (PDPT_HI_IDX = (KERN_VMA >> 30) & 0x1FF = 510)
pd_lo
[0..511] -> 512 x 2MB huge pages (identity: PA 0..1GB)
pd_hi
[0..3] -> 4 x 2MB huge pages (kernel: PA 0..8MB)
The identity map covers the first 1GB via 512 huge pages in pd_lo. The higher-half map covers 8MB (four 2MB pages in pd_hi), sufficient for the kernel binary, BSS, and any GRUB-placed multiboot2 info.
Enabling Long Mode
The mode transition follows the Intel-prescribed sequence:
1. CR4.PAE = 1 Enable Physical Address Extension
2. CR3 = pml4_table (phys) Load page table root
3. EFER.LME = 1 Enable Long Mode
EFER.NXE = 1 Enable No-Execute page support
4. GDT loaded via lgdt 64-bit code/data descriptors
5. CR0.PG = 1 Enable paging (activates long mode)
CR0.WP = 1 Write-protect for ring-0
6. Far jump 0x08:long_mode_phys Reload CS, enter 64-bit mode
The GDT used during boot lives in .text.boot (physical VMA) so lgdt works before paging is active:
| Index | Selector | Description |
|---|---|---|
| 0 | 0x00 |
Null descriptor |
| 1 | 0x08 |
64-bit code: P=1, DPL=0, L=1, Type=code/execute/read |
| 2 | 0x10 |
64-bit data: P=1, DPL=0, Type=data/read/write |
Stage 2: Physical Trampoline (long_mode_phys)
After the far jump, the CPU executes in 64-bit mode but still at a physical address (within .text.boot). This stub:
- Sets all data segment registers (
DS,ES,FS,GS,SS) to selector0x10 - Loads the full 64-bit higher-half address of
long_mode_highintoRAX - Jumps through
RAXto cross the VMA gap
mov rax, long_mode_high ; 64-bit higher-half VMA
jmp rax ; cross the VMA gap
This indirection is necessary because RIP-relative addressing cannot reach .text symbols from the physical address in .text.boot.
Stage 3: Higher-Half Entry (long_mode_high)
Now executing at 0xFFFFFFFF80xxxxxx, the kernel sets up the boot stack and calls C:
mov rsp, boot_stack_top ; 16KB stack in .bss (higher-half VMA)
xor rbp, rbp ; terminate stack unwinding
call kernel_main ; kernel_main(mb_magic, mb_info)
.halt:
hlt
jmp .halt ; unreachable
The boot stack is 16KB (16384 bytes), 16-byte aligned per the SysV AMD64 ABI requirement.
Stage 4: kernel_main Initialization Sequence
kernel_main (kernel/core/main.c) orchestrates all subsystem initialization in a carefully ordered sequence. Each subsystem prints a [TAG] OK line to serial and VGA on success.
Early Hardware
| Order | Function | Log Tag | Purpose |
|---|---|---|---|
| 1 | arch_init() |
(none) | serial_init() + vga_init() |
| 2 | arch_pat_init() |
(none) | Program PAT MSR: PA1=WC for framebuffer |
| 3 | arch_mm_init(mb_info) |
[CMDLINE] |
Parse multiboot2 memory map, framebuffer, ACPI RSDP, modules, cmdline |
arch_mm_init walks the multiboot2 tag stream (still accessible via the identity map) and populates:
- Usable RAM regions (type=1 entries) for the PMM
- Reserved regions (first 1MB, multiboot2 info, GRUB modules)
- ACPI RSDP physical address (v2 preferred, v1 fallback)
- Framebuffer info (addr, pitch, width, height, bpp)
- Kernel command line (
boot=text,quiet)
Memory Management
| Order | Function | Log Tag | Purpose |
|---|---|---|---|
| 4 | pmm_init() |
[PMM] |
Bitmap allocator: mark all reserved, free usable, re-reserve platform ranges + kernel image |
| 5 | vmm_init() |
[VMM] |
Build 5 new page tables (identity + higher-half), install mapped-window allocator, load CR3 |
| 6 | kva_init() |
[KVA] |
Kernel virtual bump allocator starting at KERN_VMA + 0x800000 |
| 7 | arch_set_master_pml4() |
(none) | Store master PML4 phys for ISR/SYSCALL CR3 switching |
Framebuffer & Display
| Order | Function | Log Tag | Purpose |
|---|---|---|---|
| 8 | fb_init() |
[FB] |
Map linear framebuffer via KVA with WC caching |
| 9 | fb_boot_splash() |
(none) | Draw boot logo (graphical mode only) |
Security & CPU Setup
The security subsystems initialized in this phase represent v1 implementations. While SMAP, SMEP, and the capability table provide meaningful protection layers, this is a from-scratch C kernel and should not be assumed to be free of exploitable vulnerabilities. The capability model (kernel/cap/) is notably the first kernel subsystem written in Rust, beginning a planned gradual migration of the kernel from C to Rust.
| Order | Function | Log Tag | Purpose |
|---|---|---|---|
| 10 | cap_init() |
[CAP] |
Initialize per-process capability table infrastructure (Rust) |
| 11 | smp_percpu_init_bsp() |
[SMP] |
Initialize per-CPU data for BSP |
| 12 | idt_init() |
[IDT] |
Install 256 interrupt gates, load IDTR |
| 13 | pic_init() |
[PIC] |
Remap 8259A: IRQ0-15 to vectors 0x20-0x2F, mask all |
| 14 | pit_init() |
[PIT] |
Program PIT channel 0 at 100 Hz, unmask IRQ0 |
| 15 | kbd_init() |
[KBD] |
PS/2 keyboard, unmask IRQ1 |
| 16 | ps2_mouse_init() |
[MOUSE] |
PS/2 mouse, unmask IRQ12 |
| 17 | arch_gdt_init() |
[GDT] |
Runtime 7-entry GDT with ring-3 descriptors + TSS |
| 18 | arch_tss_init() |
[TSS] |
TSS RSP0 for ring-3 to ring-0 transitions |
| 19 | arch_syscall_init() |
[SYSCALL] |
Program STAR/LSTAR/SFMASK MSRs for SYSCALL/SYSRET |
| 20 | arch_smap_init() |
[SMAP] |
Supervisor Mode Access Prevention (CR4.SMAP) |
| 21 | arch_smep_init() |
[SMEP] |
Supervisor Mode Execution Prevention (CR4.SMEP) |
| 22 | arch_sse_init() |
(none) | Enable SSE/SSE2 for user-mode processes |
| 23 | random_init() |
[RNG] |
ChaCha20 CSPRNG seeded from RDTSC |
Storage & Filesystems
| Order | Function | Log Tag | Purpose |
|---|---|---|---|
| 24 | ramdisk_init() |
(none) | Map GRUB module 1 (rootfs) as ramdisk0 block device |
| 25 | ramdisk_init2() |
(none) | Map GRUB module 2 (ESP image) as ramdisk1 |
| 26 | vfs_init() |
[VFS] |
Virtual filesystem + initrd mount |
| 27 | console_init() |
(none) | Register stdout device |
ACPI & APIC
| Order | Function | Log Tag | Purpose |
|---|---|---|---|
| 28 | acpi_init() |
[ACPI] |
Parse MCFG (PCIe config) + MADT (APIC topology) |
| 29 | lapic_init() |
[LAPIC] |
Local APIC initialization |
| 30 | ioapic_init() |
[IOAPIC] |
I/O APIC initialization |
| 31 | i8042 flush | (none) | Drain stale scancodes from BIOS/GRUB keyboard buffer |
PCI & Block Devices
| Order | Function | Log Tag | Purpose |
|---|---|---|---|
| 32 | pcie_init() |
[PCIE] |
Enumerate PCIe devices via ECAM |
| 33 | nvme_init() |
[NVME] |
NVMe block device |
| 34 | gpt_scan("nvme0") |
[GPT] |
GPT partition table scan |
| 35 | ext2_mount() |
(none) | Mount ext2 root (ramdisk0 preferred, nvme0p1 fallback) |
| 36 | cap_policy_load() |
(none) | Load policy capabilities from /etc/aegis/caps.d/ |
Network & USB
| Order | Function | Log Tag | Purpose |
|---|---|---|---|
| 37 | xhci_init() |
[XHCI] |
xHCI USB host controller |
| 38 | virtio_net_init() |
[NET] |
Virtio-net NIC driver |
| 39 | rtl8169_init() |
[NET] |
RTL8168/8169 NIC driver |
| 40 | net_init() |
(none) | Protocol stack + ICMP self-test |
SMP & Scheduler
| Order | Function | Log Tag | Purpose |
|---|---|---|---|
| 41 | smp_start_aps() |
[SMP] |
Wake Application Processors via INIT-SIPI-SIPI |
| 42 | sched_init() |
(none) | Initialize run queue |
| 43 | sched_spawn(task_idle) |
(none) | Create idle task (task 0) |
| 44 | proc_spawn_init() |
(none) | Spawn /sbin/init in ring 3 |
| 45 | vmm_teardown_identity() |
[VMM] |
Clear PML4[0], reload CR3 (identity map removed) |
| 46 | fb_boot_splash_end() |
(none) | Clear splash screen, unlock framebuffer |
| 47 | sched_start() |
[SCHED] |
Context-switch into first task (never returns) |
After sched_start(), the idle task enables the LAPIC timer and enters the halt loop. The scheduler preemptively multitasks between the idle task and the init process.
Boot Modes
The kernel command line controls the boot experience:
| Cmdline | Behavior |
|---|---|
boot=text |
Text console, no splash, all printk output visible on framebuffer |
boot=graphical quiet |
Boot splash displayed, printk suppressed on framebuffer (serial only) |
| (none) | Default graphical boot with printk visible |
Memory Layout at Boot
Physical Memory:
+------------------+ 0x000000
| BIOS/VGA hole | Reserved (first 1MB)
+------------------+ 0x100000
| Kernel image | .multiboot, .text.boot, .text, .rodata, .data
+------------------+
| Kernel BSS | Page tables, pmm_bitmap (128KB), boot stack (16KB)
+------------------+ _kernel_end (physical)
| Free RAM | Managed by PMM
+------------------+
Virtual Memory (after vmm_init):
+------------------+ 0x0000000000000000
| Identity map | [0..1GB) -> PA [0..1GB) (removed at step 45)
+------------------+
... gap ...
+------------------+ 0xFFFFFFFF80000000 (KERN_VMA)
| Kernel image | .text, .rodata, .data (2MB huge pages 0-1)
+------------------+ 0xFFFFFFFF80400000
| Kernel BSS | 2MB huge page 2
+------------------+ 0xFFFFFFFF80600000 (VMM_WINDOW_VA)
| Mapped window | 2 x 4KB PTE slots for page table manipulation
+------------------+ 0xFFFFFFFF80800000 (KVA_BASE)
| KVA region | Bump-allocated kernel objects (TCBs, stacks, etc.)
+------------------+
AP Bootstrap (SMP)
Application Processors follow a separate boot path. The BSP copies the AP trampoline (ap_trampoline.asm) to physical address 0x8000 and sends INIT-SIPI-SIPI sequences.
Each AP wakes in 16-bit real mode at CS=0x0800, IP=0x0000 (linear 0x8000) and transitions through:
- Real mode – Enable A20 line, load temporary GDT, set CR0.PE
- 32-bit protected mode – Enable PAE, load CR3 (shared kernel PML4), enable long mode + paging
- 64-bit long mode – Set segments, read LAPIC ID via CPUID, pick per-CPU stack from table
- Jump to
ap_entry()– Higher-half C entry point for per-CPU initialization (GDT, IDT, LAPIC)
The AP trampoline embeds its own 5-entry GDT (null, 32-bit code, 32-bit data, 64-bit code, 64-bit data) and a data area filled by the BSP before SIPI:
| Field | Size | Purpose |
|---|---|---|
ap_pml4 |
4 bytes | Physical address of kernel PML4 |
ap_entry_addr |
8 bytes | 64-bit VA of ap_entry() |
ap_stacks |
256 x 8 bytes | Per-CPU kernel stack tops (indexed by LAPIC ID) |
Runtime GDT Layout
After arch_gdt_init() replaces the boot GDT, the runtime GDT has 7 entries:
| Index | Selector | RPL=3 | Description |
|---|---|---|---|
| 0 | 0x00 |
– | Null |
| 1 | 0x08 |
– | Kernel code (DPL=0, L=1) |
| 2 | 0x10 |
– | Kernel data (DPL=0) |
| 3 | 0x18 |
0x1B |
User data (DPL=3) – must precede user code |
| 4 | 0x20 |
0x23 |
User code (DPL=3) |
| 5-6 | 0x28 |
– | TSS (16-byte system descriptor) |
The user data/code ordering is critical for SYSRET. STAR MSR [63:48] = 0x10 causes SYSRET to derive:
SS = (0x10 + 8) | 3 = 0x1B(GDT[3] = user data)CS = (0x10 + 16) | 3 = 0x23(GDT[4] = user code)
Each CPU gets its own GDT copy because LTR sets the Busy bit in the TSS descriptor.
See Also
- Memory Management – PMM, VMM, and KVA subsystem details
- Interrupts & Exceptions – IDT setup, ISR dispatch, PIC/APIC
- Architecture Overview – x86-64 hardware abstraction
- Scheduler – Task management and context switching