Interrupts & Exceptions

Aegis handles all interrupts and CPU exceptions through a unified IDT with 256 entries. Hardware interrupts are routed through both the legacy 8259A PIC and the APIC (Local APIC + I/O APIC) for SMP support. The ISR entry path manages CR3 switching and SWAPGS for safe transitions between user and kernel address spaces.

v1 maturity notice. The interrupt and exception handling code is written in a mix of x86-64 assembly (isr.asm) and C (idt.c). While the ISR entry path handles SWAPGS, CR3 switching, and AMD SS normalization correctly for known cases, this is v1 software in a predominantly C codebase. The exception handler panics on all CPU faults rather than attempting recovery – there is no fault fixup table. Subtle bugs in the assembly entry path or the C dispatch logic could be exploitable. A gradual Rust migration is planned for the kernel; kernel/cap/ is already in Rust as the first step. Contributions are welcome – file issues or propose changes at exec/aegis.

Interrupt Descriptor Table (IDT)

Source: kernel/arch/x86_64/idt.c, kernel/arch/x86_64/idt.h

Gate Descriptor Format

Each IDT gate is a 16-byte x86-64 interrupt gate descriptor:

typedef struct {
    uint16_t offset_lo;   // Handler address bits [0:15]
    uint16_t selector;    // Kernel code segment (0x08)
    uint8_t  ist;         // Interrupt Stack Table index (0 = no IST)
    uint8_t  type_attr;   // 0x8E = present, DPL=0, 64-bit interrupt gate
    uint16_t offset_mid;  // Handler address bits [16:31]
    uint32_t offset_hi;   // Handler address bits [32:63]
    uint32_t zero;        // Reserved
} aegis_idt_gate_t;

All gates use type_attr = 0x8E (present, DPL=0, interrupt gate), which automatically clears IF on entry. This prevents nested interrupts during critical handler code.

Vector Assignment

Range Count Purpose
0x00 – 0x1F 32 CPU exceptions
0x20 – 0x2F 16 Hardware IRQs (PIC-remapped)
0x30 1 LAPIC timer
0xF0 – 0xFD 14 Remapped PIC stubs (stale interrupt catchers)
0xFE 1 TLB shootdown IPI
0xFF 1 LAPIC spurious interrupt

Special Gates

Double fault (#DF, vector 8) uses IST1 – a dedicated interrupt stack. This prevents a triple fault when #DF occurs due to stack overflow or RSP corruption. IST field value 1 in the gate maps to tss.ist[0] (Intel uses 1-indexed IST in the gate, C array is 0-indexed).

LAPIC spurious (0xFF) – per Intel specification, no EOI must be sent for spurious interrupts.

Initialization

void idt_init(void) {
    // Install 48 ISR stubs (vectors 0-31 + IRQ 0x20-0x2F)
    for (int i = 0; i < 48; i++)
        idt_gate_set(i, isr_stubs[i]);

    idt_gate_set(0x30, isr_stub_lapic_timer);

    // Remapped PIC vectors 0xF0-0xFD
    for (int i = 0; i < 14; i++)
        idt_gate_set(0xF0 + i, isr_stubs_remap_pic[i]);

    idt_gate_set(0xFE, isr_stub_tlb_shootdown);
    idt_gate_set(0xFF, isr_stub_spurious);

    s_idt[8].ist = 1;   // #DF uses IST1

    __asm__ volatile("lidt %0" : : "m"(idtr));
}

The IDT is shared across all CPUs. APs call arch_load_idt() to reload the IDTR without reinitializing gate entries.

ISR Entry Path

Source: kernel/arch/x86_64/isr.asm

Stub Macros

Two macros generate per-vector entry stubs:

; For exceptions that DO NOT push an error code:
%macro ISR_NOERR 1
isr_%1:
    push qword 0       ; fake error code (uniform frame)
    push qword %1      ; vector number
    jmp isr_common_stub
%endmacro

; For exceptions that push an error code:
%macro ISR_ERR 1
isr_%1:
    push qword %1      ; vector number (error code already pushed by CPU)
    jmp isr_common_stub
%endmacro

Exception Error Code Classification

With Error Code Without Error Code
#DF (8), #TS (10), #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), #CP (21), #VC (29), #SX (30) #DE (0), #DB (1), NMI (2), #BP (3), #OF (4), #BR (5), #UD (6), #NM (7), all reserved, all IRQs

isr_common_stub – Full Entry Sequence

The common stub manages the complete transition from interrupted context to C handler:

1. SWAPGS (conditional)    If CS on stack == 0x23 (user code), swap GS.base
                           so kernel GS points to percpu_t
2. Push all GPRs           rax, rbx, rcx, rdx, rsi, rdi, rbp, r8-r15
3. Save CR3                Push current CR3 onto stack
4. Switch to master PML4   If g_master_pml4 != 0 and CR3 != master, load master
5. Call isr_dispatch()     Pass cpu_state_t* (RSP+8, skipping saved CR3)
6. Call signal_deliver()   Check pending signals before ring-3 return
7. Restore CR3             Pop saved CR3, reload if non-zero
8. Restore all GPRs        Pop in reverse order
9. SWAPGS (conditional)    If returning to ring 3, swap GS.base back
10. iretq                  Discard vector + error_code, return

Stack Layout

After all pushes, the stack contains:

[RSP+0]    saved CR3          (pushed by isr_common_stub)
[RSP+8]    r15                <- cpu_state_t begins here
[RSP+16]   r14
[RSP+24]   r13
[RSP+32]   r12
[RSP+40]   r11
[RSP+48]   r10
[RSP+56]   r9
[RSP+64]   r8
[RSP+72]   rbp
[RSP+80]   rdi
[RSP+88]   rsi
[RSP+96]   rdx
[RSP+104]  rcx
[RSP+112]  rbx
[RSP+120]  rax
[RSP+128]  vector
[RSP+136]  error_code
[RSP+144]  rip                <- CPU-pushed interrupt frame
[RSP+152]  cs
[RSP+160]  rflags
[RSP+168]  rsp (user)         (only on ring-3 -> ring-0 transition)
[RSP+176]  ss (user)

This is represented in C as cpu_state_t:

typedef struct cpu_state {
    uint64_t r15, r14, r13, r12, r11, r10, r9, r8;
    uint64_t rbp, rdi, rsi, rdx, rcx, rbx, rax;
    uint64_t vector, error_code;
    uint64_t rip, cs, rflags, rsp, ss;   // CPU-pushed
} cpu_state_t;

CR3 Switching

When an interrupt fires during user-mode execution, the CPU’s CR3 contains the user process’s PML4. Kernel code (isr_dispatch, scheduler, printk) must run with the master PML4 so that KVA-mapped objects (TCBs, kernel stacks) are reachable.

The saved CR3 is stored on the stack (not a global), which is critical for correctness across context switches: if sched_tick abandons the current ISR frame and switches to another task, the saved CR3 stays on the old task’s stack and is restored when that task is rescheduled.

SWAPGS

The SWAPGS instruction swaps GS.base with IA32_KERNEL_GS_BASE MSR. At ISR entry from ring 3, GS.base points to the user’s TLS; after SWAPGS, it points to the per-CPU percpu_t structure. The conditional check uses CS == 0x23 (user code selector with RPL=3) to determine if SWAPGS is needed.

Fork Child Entry

isr_post_dispatch is the entry point for a fork child’s first scheduling. sys_fork builds a fake isr_common_stub frame on the child’s kernel stack and sets the ctx_switch return address to this label. When ctx_switch returns here, the stack looks as if isr_dispatch just returned, and the child enters user space via iretq with rax=0 (fork returns 0 in child).

ISR Dispatch

Source: kernel/arch/x86_64/idt.c

isr_dispatch() is the C-level interrupt handler called from isr_common_stub:

CPU Exceptions (vectors 0–31)

All CPU exceptions trigger a kernel panic with diagnostic output:

[PANIC] exception <vec> at RIP=0x<addr> error=0x<code> CS=0x<cs>

Special diagnostics for:

  • #PF (14): Prints CR2 (faulting address), key registers, FS.base
  • #GP (13): Dumps the iretq frame (all 5 slots: RIP, CS, RFLAGS, RSP, SS)
  • Kernel faults (CS=0x08): Prints stack backtrace by walking the RBP frame-pointer chain (up to 16 frames)

After diagnostics, a bluescreen is rendered on the framebuffer.

Hardware IRQs (vectors 0x20–0x2F)

EOI is sent before the handler runs. This is critical because pit_handler calls sched_tick which calls ctx_switch – if the handler switches to another task before sending EOI, the outgoing task carries the EOI obligation and the IRQ goes dark until that task is rescheduled.

Vector IRQ Handler
0x20 0 pit_handler() – tick counter, scheduler, USB/network polling
0x21 1 kbd_handler() – PS/2 keyboard scancode
0x2C 12 ps2_mouse_handler() – PS/2 mouse data
(dynamic) SCI acpi_sci_handler() – ACPI system control interrupt

LAPIC Timer (vector 0x30)

EOI is sent before the handler for the same reason as PIC IRQs. The LAPIC timer runs at approximately 100 Hz (calibrated against the PIT) and drives the scheduler via lapic_timer_handler().

Special Vectors

Vector Handler EOI
0xF0–0xFD Silently dropped None (PIC is dead)
0xFE tlb_shootdown_handler() Handler sends its own LAPIC EOI
0xFF Return immediately None (Intel spec: no EOI for spurious)

The 0xF0–0xFD stubs exist because pic_disable() remaps the 8259A to vectors 0xF0–0xFF before masking. On real hardware, a pending PIC interrupt can be in-flight during the remap and delivered after STI. Without IDT entries, the CPU raises #GP.

AMD SS Normalization

AMD CPUs in 64-bit mode may strip SS RPL bits on ring-3 interrupt entry, pushing SS=0x18 (RPL=0) instead of SS=0x1B (RPL=3). Since iretq to CPL=3 requires SS.RPL == 3, the dispatch function unconditionally forces RPL=3 on the return path for ring-3 interrupts:

if (s->cs == ARCH_USER_CS)
    s->ss |= 3;

This is harmless on Intel (which already pushes RPL=3) and required on AMD.

8259A PIC

Source: kernel/arch/x86_64/pic.c, kernel/arch/x86_64/pic.h

Initialization

The dual 8259A PIC is initialized in cascade mode, remapping IRQ0-15 to vectors 0x20-0x2F:

Step Master (0x20/0x21) Slave (0xA0/0xA1)
ICW1 0x11 (init, ICW4 needed) 0x11
ICW2 0x20 (vectors 0x20-0x27) 0x28 (vectors 0x28-0x2F)
ICW3 0x04 (slave on IRQ2) 0x02 (cascade identity 2)
ICW4 0x01 (8086 mode) 0x01
Mask 0xFF (all masked) 0xFF (all masked)

After initialization, all IRQs are masked. Drivers call pic_unmask(irq) when ready.

Spurious IRQ Detection

The PIC can generate spurious IRQ7 (master) or IRQ15 (slave). pic_irq_is_real() reads the In-Service Register (ISR) via OCW3 command 0x0B:

int pic_irq_is_real(uint8_t irq) {
    if (irq < 8) {
        outb(PIC1_CMD, 0x0B);
        return (inb(PIC1_CMD) >> irq) & 1;
    } else {
        outb(PIC2_CMD, 0x0B);
        return (inb(PIC2_CMD) >> (irq - 8)) & 1;
    }
}

For spurious interrupts:

  • IRQ7 (master): No EOI sent at all
  • IRQ15 (slave): EOI sent to master only (PIC1 received the cascade on IRQ2), no EOI to PIC2

Sending EOI for a spurious interrupt would clear the ISR bit of a real in-service interrupt.

EOI

void pic_send_eoi(uint8_t irq) {
    if (irq >= 8)
        outb(PIC2_CMD, 0x20);   // EOI to slave first
    outb(PIC1_CMD, 0x20);       // EOI to master
}

For slave IRQs (8-15), both PICs receive an EOI because the cascade interrupt (IRQ2) is also in-service on the master.

Programmable Interval Timer (PIT)

Source: kernel/arch/x86_64/pit.c, kernel/arch/x86_64/pit.h

Configuration

Parameter Value
Base frequency 1,193,182 Hz
Divisor 11932
Resulting frequency ~100 Hz
Mode Channel 0, square wave (mode 3)
IRQ 0 (vector 0x20)

Tick Handler

pit_handler() runs on every tick (~100 Hz) and performs:

  1. Increment tick counter (s_ticks)
  2. Add interrupt entropy to CSPRNG
  3. sched_tick() – preemptive scheduling
  4. xhci_poll() – USB event ring polling
  5. netdev_poll_all() – network device polling
  6. ip_loopback_poll() – loopback queue drain
  7. tcp_tick() – TCP retransmit timer
  8. inb(0x61) – yield to QEMU SLIRP event loop
  9. Wake poll waiter (if any process is blocked on sys_poll)
  10. Check shutdown flag – arch_debug_exit() if set

Wall Clock

The PIT provides a simple wall clock:

void arch_clock_gettime(uint64_t *sec, uint64_t *nsec) {
    *sec  = epoch_offset + ticks / 100;
    *nsec = (ticks % 100) * 10000000UL;
}

epoch_offset is set by sys_clock_settime (NTP daemon).

LAPIC Integration

When the Local APIC is active (lapic_active() returns true), hardware IRQs use LAPIC EOI instead of PIC EOI. The LAPIC timer (vector 0x30) replaces the PIT as the scheduler’s tick source once the idle task starts.

The PIC-to-APIC transition can cause stale interrupts. After ioapic_init(), the kernel flushes the i8042 output buffer to clear stale keyboard scancodes that hold IRQ1 asserted:

while (inb(0x64) & 0x01)
    (void)inb(0x60);

This fixes intermittent “no keyboard on boot” on bare metal when stale scancodes from BIOS/GRUB prevent new keyboard interrupts.

TLB Shootdown

Vector 0xFE is used for inter-processor TLB invalidation. When a page table entry is modified for a shared address space, tlb_shootdown() sends an IPI to all other CPUs. The handler performs the local invlpg and sends its own LAPIC EOI (separate from the normal EOI path in isr_dispatch).

Shutdown Path

Clean shutdown is requested via arch_request_shutdown(), which sets a flag checked by pit_handler on the next tick. The actual arch_debug_exit() is deferred to ISR context (IF=0) to prevent the QEMU async race where task code continues running after the port write.

void arch_request_shutdown(void) {
    s_shutdown = 1;
    // pit_handler checks this after sched_tick, calls arch_debug_exit(0x01)
}

QEMU’s isa-debug-exit device exits with code (value << 1) | 1, so writing 0x01 produces exit code 3.

Panic and Bluescreen

All CPU exceptions in v1 are fatal – the kernel does not attempt recovery for any fault, including potentially recoverable ones like #PF in certain contexts. This is a deliberate v1 simplification. A production kernel would need fault fixup tables, per-process signal delivery for user-mode faults, and graceful handling of kernel-mode recoverable faults. These are areas where the planned C-to-Rust migration will provide additional safety guarantees.

When a CPU exception occurs, after printing diagnostics to serial, the kernel renders a bluescreen on the framebuffer via panic_bluescreen(). For kernel-mode faults (CS == ARCH_KERNEL_CS), a stack backtrace is printed by walking the RBP chain:

static void panic_backtrace(uint64_t rbp) {
    for (int i = 0; i < 16; i++) {
        if (rbp < 0xFFFFFFFF80000000ULL || (rbp & 7ULL))
            break;
        uint64_t retaddr = ((uint64_t *)rbp)[1];
        printk("[PANIC]   [%u] 0x%lx\n", i, retaddr);
        rbp = ((uint64_t *)rbp)[0];
    }
}

Addresses can be resolved with make sym ADDR=0x<addr>.

See Also