Interrupts & Exceptions
IDT setup, ISR dispatch, PIC/APIC interrupt routing, exception handling, and CR3/SWAPGS management
Interrupts & Exceptions
Aegis handles all interrupts and CPU exceptions through a unified IDT with 256 entries. Hardware interrupts are routed through both the legacy 8259A PIC and the APIC (Local APIC + I/O APIC) for SMP support. The ISR entry path manages CR3 switching and SWAPGS for safe transitions between user and kernel address spaces.
v1 maturity notice. The interrupt and exception handling code is written in a mix of x86-64 assembly (
isr.asm) and C (idt.c). While the ISR entry path handles SWAPGS, CR3 switching, and AMD SS normalization correctly for known cases, this is v1 software in a predominantly C codebase. The exception handler panics on all CPU faults rather than attempting recovery – there is no fault fixup table. Subtle bugs in the assembly entry path or the C dispatch logic could be exploitable. A gradual Rust migration is planned for the kernel;kernel/cap/is already in Rust as the first step. Contributions are welcome – file issues or propose changes at exec/aegis.
Interrupt Descriptor Table (IDT)
Source: kernel/arch/x86_64/idt.c, kernel/arch/x86_64/idt.h
Gate Descriptor Format
Each IDT gate is a 16-byte x86-64 interrupt gate descriptor:
typedef struct {
uint16_t offset_lo; // Handler address bits [0:15]
uint16_t selector; // Kernel code segment (0x08)
uint8_t ist; // Interrupt Stack Table index (0 = no IST)
uint8_t type_attr; // 0x8E = present, DPL=0, 64-bit interrupt gate
uint16_t offset_mid; // Handler address bits [16:31]
uint32_t offset_hi; // Handler address bits [32:63]
uint32_t zero; // Reserved
} aegis_idt_gate_t;
All gates use type_attr = 0x8E (present, DPL=0, interrupt gate), which automatically clears IF on entry. This prevents nested interrupts during critical handler code.
Vector Assignment
| Range | Count | Purpose |
|---|---|---|
| 0x00 – 0x1F | 32 | CPU exceptions |
| 0x20 – 0x2F | 16 | Hardware IRQs (PIC-remapped) |
| 0x30 | 1 | LAPIC timer |
| 0xF0 – 0xFD | 14 | Remapped PIC stubs (stale interrupt catchers) |
| 0xFE | 1 | TLB shootdown IPI |
| 0xFF | 1 | LAPIC spurious interrupt |
Special Gates
Double fault (#DF, vector 8) uses IST1 – a dedicated interrupt stack. This prevents a triple fault when #DF occurs due to stack overflow or RSP corruption. IST field value 1 in the gate maps to tss.ist[0] (Intel uses 1-indexed IST in the gate, C array is 0-indexed).
LAPIC spurious (0xFF) – per Intel specification, no EOI must be sent for spurious interrupts.
Initialization
void idt_init(void) {
// Install 48 ISR stubs (vectors 0-31 + IRQ 0x20-0x2F)
for (int i = 0; i < 48; i++)
idt_gate_set(i, isr_stubs[i]);
idt_gate_set(0x30, isr_stub_lapic_timer);
// Remapped PIC vectors 0xF0-0xFD
for (int i = 0; i < 14; i++)
idt_gate_set(0xF0 + i, isr_stubs_remap_pic[i]);
idt_gate_set(0xFE, isr_stub_tlb_shootdown);
idt_gate_set(0xFF, isr_stub_spurious);
s_idt[8].ist = 1; // #DF uses IST1
__asm__ volatile("lidt %0" : : "m"(idtr));
}
The IDT is shared across all CPUs. APs call arch_load_idt() to reload the IDTR without reinitializing gate entries.
ISR Entry Path
Source: kernel/arch/x86_64/isr.asm
Stub Macros
Two macros generate per-vector entry stubs:
; For exceptions that DO NOT push an error code:
%macro ISR_NOERR 1
isr_%1:
push qword 0 ; fake error code (uniform frame)
push qword %1 ; vector number
jmp isr_common_stub
%endmacro
; For exceptions that push an error code:
%macro ISR_ERR 1
isr_%1:
push qword %1 ; vector number (error code already pushed by CPU)
jmp isr_common_stub
%endmacro
Exception Error Code Classification
| With Error Code | Without Error Code |
|---|---|
| #DF (8), #TS (10), #NP (11), #SS (12), #GP (13), #PF (14), #AC (17), #CP (21), #VC (29), #SX (30) | #DE (0), #DB (1), NMI (2), #BP (3), #OF (4), #BR (5), #UD (6), #NM (7), all reserved, all IRQs |
isr_common_stub – Full Entry Sequence
The common stub manages the complete transition from interrupted context to C handler:
1. SWAPGS (conditional) If CS on stack == 0x23 (user code), swap GS.base
so kernel GS points to percpu_t
2. Push all GPRs rax, rbx, rcx, rdx, rsi, rdi, rbp, r8-r15
3. Save CR3 Push current CR3 onto stack
4. Switch to master PML4 If g_master_pml4 != 0 and CR3 != master, load master
5. Call isr_dispatch() Pass cpu_state_t* (RSP+8, skipping saved CR3)
6. Call signal_deliver() Check pending signals before ring-3 return
7. Restore CR3 Pop saved CR3, reload if non-zero
8. Restore all GPRs Pop in reverse order
9. SWAPGS (conditional) If returning to ring 3, swap GS.base back
10. iretq Discard vector + error_code, return
Stack Layout
After all pushes, the stack contains:
[RSP+0] saved CR3 (pushed by isr_common_stub)
[RSP+8] r15 <- cpu_state_t begins here
[RSP+16] r14
[RSP+24] r13
[RSP+32] r12
[RSP+40] r11
[RSP+48] r10
[RSP+56] r9
[RSP+64] r8
[RSP+72] rbp
[RSP+80] rdi
[RSP+88] rsi
[RSP+96] rdx
[RSP+104] rcx
[RSP+112] rbx
[RSP+120] rax
[RSP+128] vector
[RSP+136] error_code
[RSP+144] rip <- CPU-pushed interrupt frame
[RSP+152] cs
[RSP+160] rflags
[RSP+168] rsp (user) (only on ring-3 -> ring-0 transition)
[RSP+176] ss (user)
This is represented in C as cpu_state_t:
typedef struct cpu_state {
uint64_t r15, r14, r13, r12, r11, r10, r9, r8;
uint64_t rbp, rdi, rsi, rdx, rcx, rbx, rax;
uint64_t vector, error_code;
uint64_t rip, cs, rflags, rsp, ss; // CPU-pushed
} cpu_state_t;
CR3 Switching
When an interrupt fires during user-mode execution, the CPU’s CR3 contains the user process’s PML4. Kernel code (isr_dispatch, scheduler, printk) must run with the master PML4 so that KVA-mapped objects (TCBs, kernel stacks) are reachable.
The saved CR3 is stored on the stack (not a global), which is critical for correctness across context switches: if sched_tick abandons the current ISR frame and switches to another task, the saved CR3 stays on the old task’s stack and is restored when that task is rescheduled.
SWAPGS
The SWAPGS instruction swaps GS.base with IA32_KERNEL_GS_BASE MSR. At ISR entry from ring 3, GS.base points to the user’s TLS; after SWAPGS, it points to the per-CPU percpu_t structure. The conditional check uses CS == 0x23 (user code selector with RPL=3) to determine if SWAPGS is needed.
Fork Child Entry
isr_post_dispatch is the entry point for a fork child’s first scheduling. sys_fork builds a fake isr_common_stub frame on the child’s kernel stack and sets the ctx_switch return address to this label. When ctx_switch returns here, the stack looks as if isr_dispatch just returned, and the child enters user space via iretq with rax=0 (fork returns 0 in child).
ISR Dispatch
Source: kernel/arch/x86_64/idt.c
isr_dispatch() is the C-level interrupt handler called from isr_common_stub:
CPU Exceptions (vectors 0–31)
All CPU exceptions trigger a kernel panic with diagnostic output:
[PANIC] exception <vec> at RIP=0x<addr> error=0x<code> CS=0x<cs>
Special diagnostics for:
- #PF (14): Prints CR2 (faulting address), key registers, FS.base
- #GP (13): Dumps the iretq frame (all 5 slots: RIP, CS, RFLAGS, RSP, SS)
- Kernel faults (CS=0x08): Prints stack backtrace by walking the RBP frame-pointer chain (up to 16 frames)
After diagnostics, a bluescreen is rendered on the framebuffer.
Hardware IRQs (vectors 0x20–0x2F)
EOI is sent before the handler runs. This is critical because pit_handler calls sched_tick which calls ctx_switch – if the handler switches to another task before sending EOI, the outgoing task carries the EOI obligation and the IRQ goes dark until that task is rescheduled.
| Vector | IRQ | Handler |
|---|---|---|
| 0x20 | 0 | pit_handler() – tick counter, scheduler, USB/network polling |
| 0x21 | 1 | kbd_handler() – PS/2 keyboard scancode |
| 0x2C | 12 | ps2_mouse_handler() – PS/2 mouse data |
| (dynamic) | SCI | acpi_sci_handler() – ACPI system control interrupt |
LAPIC Timer (vector 0x30)
EOI is sent before the handler for the same reason as PIC IRQs. The LAPIC timer runs at approximately 100 Hz (calibrated against the PIT) and drives the scheduler via lapic_timer_handler().
Special Vectors
| Vector | Handler | EOI |
|---|---|---|
| 0xF0–0xFD | Silently dropped | None (PIC is dead) |
| 0xFE | tlb_shootdown_handler() |
Handler sends its own LAPIC EOI |
| 0xFF | Return immediately | None (Intel spec: no EOI for spurious) |
The 0xF0–0xFD stubs exist because pic_disable() remaps the 8259A to vectors 0xF0–0xFF before masking. On real hardware, a pending PIC interrupt can be in-flight during the remap and delivered after STI. Without IDT entries, the CPU raises #GP.
AMD SS Normalization
AMD CPUs in 64-bit mode may strip SS RPL bits on ring-3 interrupt entry, pushing SS=0x18 (RPL=0) instead of SS=0x1B (RPL=3). Since iretq to CPL=3 requires SS.RPL == 3, the dispatch function unconditionally forces RPL=3 on the return path for ring-3 interrupts:
if (s->cs == ARCH_USER_CS)
s->ss |= 3;
This is harmless on Intel (which already pushes RPL=3) and required on AMD.
8259A PIC
Source: kernel/arch/x86_64/pic.c, kernel/arch/x86_64/pic.h
Initialization
The dual 8259A PIC is initialized in cascade mode, remapping IRQ0-15 to vectors 0x20-0x2F:
| Step | Master (0x20/0x21) | Slave (0xA0/0xA1) |
|---|---|---|
| ICW1 | 0x11 (init, ICW4 needed) |
0x11 |
| ICW2 | 0x20 (vectors 0x20-0x27) |
0x28 (vectors 0x28-0x2F) |
| ICW3 | 0x04 (slave on IRQ2) |
0x02 (cascade identity 2) |
| ICW4 | 0x01 (8086 mode) |
0x01 |
| Mask | 0xFF (all masked) |
0xFF (all masked) |
After initialization, all IRQs are masked. Drivers call pic_unmask(irq) when ready.
Spurious IRQ Detection
The PIC can generate spurious IRQ7 (master) or IRQ15 (slave). pic_irq_is_real() reads the In-Service Register (ISR) via OCW3 command 0x0B:
int pic_irq_is_real(uint8_t irq) {
if (irq < 8) {
outb(PIC1_CMD, 0x0B);
return (inb(PIC1_CMD) >> irq) & 1;
} else {
outb(PIC2_CMD, 0x0B);
return (inb(PIC2_CMD) >> (irq - 8)) & 1;
}
}
For spurious interrupts:
- IRQ7 (master): No EOI sent at all
- IRQ15 (slave): EOI sent to master only (PIC1 received the cascade on IRQ2), no EOI to PIC2
Sending EOI for a spurious interrupt would clear the ISR bit of a real in-service interrupt.
EOI
void pic_send_eoi(uint8_t irq) {
if (irq >= 8)
outb(PIC2_CMD, 0x20); // EOI to slave first
outb(PIC1_CMD, 0x20); // EOI to master
}
For slave IRQs (8-15), both PICs receive an EOI because the cascade interrupt (IRQ2) is also in-service on the master.
Programmable Interval Timer (PIT)
Source: kernel/arch/x86_64/pit.c, kernel/arch/x86_64/pit.h
Configuration
| Parameter | Value |
|---|---|
| Base frequency | 1,193,182 Hz |
| Divisor | 11932 |
| Resulting frequency | ~100 Hz |
| Mode | Channel 0, square wave (mode 3) |
| IRQ | 0 (vector 0x20) |
Tick Handler
pit_handler() runs on every tick (~100 Hz) and performs:
- Increment tick counter (
s_ticks) - Add interrupt entropy to CSPRNG
sched_tick()– preemptive schedulingxhci_poll()– USB event ring pollingnetdev_poll_all()– network device pollingip_loopback_poll()– loopback queue draintcp_tick()– TCP retransmit timerinb(0x61)– yield to QEMU SLIRP event loop- Wake poll waiter (if any process is blocked on
sys_poll) - Check shutdown flag –
arch_debug_exit()if set
Wall Clock
The PIT provides a simple wall clock:
void arch_clock_gettime(uint64_t *sec, uint64_t *nsec) {
*sec = epoch_offset + ticks / 100;
*nsec = (ticks % 100) * 10000000UL;
}
epoch_offset is set by sys_clock_settime (NTP daemon).
LAPIC Integration
When the Local APIC is active (lapic_active() returns true), hardware IRQs use LAPIC EOI instead of PIC EOI. The LAPIC timer (vector 0x30) replaces the PIT as the scheduler’s tick source once the idle task starts.
The PIC-to-APIC transition can cause stale interrupts. After ioapic_init(), the kernel flushes the i8042 output buffer to clear stale keyboard scancodes that hold IRQ1 asserted:
while (inb(0x64) & 0x01)
(void)inb(0x60);
This fixes intermittent “no keyboard on boot” on bare metal when stale scancodes from BIOS/GRUB prevent new keyboard interrupts.
TLB Shootdown
Vector 0xFE is used for inter-processor TLB invalidation. When a page table entry is modified for a shared address space, tlb_shootdown() sends an IPI to all other CPUs. The handler performs the local invlpg and sends its own LAPIC EOI (separate from the normal EOI path in isr_dispatch).
Shutdown Path
Clean shutdown is requested via arch_request_shutdown(), which sets a flag checked by pit_handler on the next tick. The actual arch_debug_exit() is deferred to ISR context (IF=0) to prevent the QEMU async race where task code continues running after the port write.
void arch_request_shutdown(void) {
s_shutdown = 1;
// pit_handler checks this after sched_tick, calls arch_debug_exit(0x01)
}
QEMU’s isa-debug-exit device exits with code (value << 1) | 1, so writing 0x01 produces exit code 3.
Panic and Bluescreen
All CPU exceptions in v1 are fatal – the kernel does not attempt recovery for any fault, including potentially recoverable ones like #PF in certain contexts. This is a deliberate v1 simplification. A production kernel would need fault fixup tables, per-process signal delivery for user-mode faults, and graceful handling of kernel-mode recoverable faults. These are areas where the planned C-to-Rust migration will provide additional safety guarantees.
When a CPU exception occurs, after printing diagnostics to serial, the kernel renders a bluescreen on the framebuffer via panic_bluescreen(). For kernel-mode faults (CS == ARCH_KERNEL_CS), a stack backtrace is printed by walking the RBP chain:
static void panic_backtrace(uint64_t rbp) {
for (int i = 0; i < 16; i++) {
if (rbp < 0xFFFFFFFF80000000ULL || (rbp & 7ULL))
break;
uint64_t retaddr = ((uint64_t *)rbp)[1];
printk("[PANIC] [%u] 0x%lx\n", i, retaddr);
rbp = ((uint64_t *)rbp)[0];
}
}
Addresses can be resolved with make sym ADDR=0x<addr>.
See Also
- Boot Process – IDT initialization order, PIC/PIT setup
- Memory Management – CR3 switching, page fault implications
- Scheduler –
sched_tick, context switching from ISR - Syscall Interface – SYSCALL/SYSRET entry (separate from interrupt path)