Device Drivers

Aegis includes a set of polling-based device drivers for storage, networking, USB input, and display. All driver code is written in C and interacts directly with hardware via MMIO and DMA – the most security-sensitive kind of kernel code. As v1 software, these drivers have not been formally audited and should be assumed to contain bugs that could be triggered by malicious or malformed hardware responses (e.g., a compromised NIC returning crafted DMA descriptors). This is expected for any from-scratch driver stack at this maturity level. The planned C-to-Rust kernel migration will eventually cover driver code, but drivers are not the first target – the capability system (kernel/cap/src/lib.rs) is the initial Rust beachhead.

Contributions are welcome – file issues or propose changes at exec/aegis.

All drivers follow a common pattern:

  1. PCI enumeration via the ECAM config space scanner
  2. BAR MMIO mapping into kernel virtual address space (uncached)
  3. DMA buffer allocation via kva_alloc_pages() + kva_page_phys()
  4. Polling at 100 Hz from the PIT timer ISR (no MSI/MSI-X interrupts)
  5. Registration with a subsystem abstraction layer (blkdev_t, netdev_t)

PCI Express Enumeration (pcie.h / pcie.c)

ECAM Config Space

PCIe configuration space is accessed via ECAM (Enhanced Configuration Access Mechanism). The MCFG ACPI table provides the MMIO base address. The scanner iterates all bus/device/function combinations:

#define PCIE_MAX_DEVICES 64

typedef struct {
    uint16_t vendor_id;
    uint16_t device_id;
    uint8_t  class_code;
    uint8_t  subclass;
    uint8_t  progif;
    uint8_t  bus, dev, fn;
    uint64_t bar[6];         /* decoded BAR base addresses (64-bit aware) */
} pcie_device_t;

pcie_init() populates a table of up to 64 devices. Drivers locate their hardware via:

  • pcie_find_device(class, subclass, progif): Match by class code (pass 0xFF to match any field)
  • pcie_get_devices() / pcie_device_count(): Iterate the full device table

Config Space Accessors

uint8_t  pcie_read8 (uint8_t bus, uint8_t dev, uint8_t fn, uint16_t off);
uint16_t pcie_read16(uint8_t bus, uint8_t dev, uint8_t fn, uint16_t off);
uint32_t pcie_read32(uint8_t bus, uint8_t dev, uint8_t fn, uint16_t off);
void     pcie_write32(uint8_t bus, uint8_t dev, uint8_t fn, uint16_t off, uint32_t val);

These are used by drivers to read/write PCI capability lists, command registers, and BAR-related configuration.

BAR Mapping Pattern

All drivers follow the same pattern to map BAR MMIO into kernel address space:

#define MMIO_FLAGS (VMM_FLAG_WRITABLE | VMM_FLAG_WC | VMM_FLAG_UCMINUS)

static uintptr_t map_bar(uint64_t pa, uint32_t n_pages)
{
    uintptr_t va = (uintptr_t)kva_alloc_pages(n_pages);
    for (uint32_t i = 0; i < n_pages; i++) {
        uintptr_t page_va = va + i * 4096;
        vmm_unmap_page(page_va);  /* remove kva default mapping */
        vmm_map_page(page_va, pa + i * 4096, MMIO_FLAGS);
    }
    return va;
}

kva_alloc_pages() allocates PMM frames and maps them; the mapping is then overwritten with the BAR physical address using uncached MMIO flags. The original PMM frames are leaked (acceptable during init).

DMA Page Allocation

static void alloc_dma_page(uint64_t *phys_out, uintptr_t *virt_out)
{
    uintptr_t va = (uintptr_t)kva_alloc_pages(1);
    uint64_t  pa = kva_page_phys((void *)va);
    memset((void *)va, 0, 4096);
    *phys_out = pa;
    *virt_out = va;
}

Returns both a kernel virtual address (for CPU access) and a physical address (for DMA descriptor programming).

NVMe Storage Driver (nvme.c)

Overview

The NVMe driver implements the NVMe 1.4 admin and I/O command sets. It operates synchronously (doorbell + poll completion) with no interrupts.

Device Detection

const pcie_device_t *dev = pcie_find_device(0x01, 0x08, 0x02);
/* class=0x01 (storage), subclass=0x08 (NVM), progif=0x02 (NVMe) */

Initialization Sequence

1. Find NVMe controller via pcie_find_device(0x01, 0x08, 0x02)
2. Map BAR0 (4 pages = 16 KB) as uncached MMIO
3. Read doorbell stride from CAP[55:52]
4. Disable controller: CC.EN=0, wait CSTS.RDY=0
5. Allocate admin SQ + CQ (64 entries each, 1 page each)
6. Program AQA/ASQ/ACQ, set CC.EN=1, wait CSTS.RDY=1
7. Identify Controller (admin cmd 0x06, CNS=1)
8. Identify Namespace NSID=1 (admin cmd 0x06, CNS=0)
9. Create I/O CQ (admin cmd 0x05) + I/O SQ (admin cmd 0x01)
10. Allocate bounce buffer (1 page)
11. Register blkdev_t "nvme0"

Register Layout (BAR0)

typedef struct __attribute__((packed)) {
    uint64_t cap;       /* 0x00: Controller Capabilities */
    uint32_t vs;        /* 0x08: Version */
    uint32_t intms;     /* 0x0C: Interrupt Mask Set */
    uint32_t intmc;     /* 0x10: Interrupt Mask Clear */
    uint32_t cc;        /* 0x14: Controller Configuration */
    uint32_t reserved0; /* 0x18 */
    uint32_t csts;      /* 0x1C: Controller Status */
    uint32_t nssr;      /* 0x20: NVM Subsystem Reset */
    uint32_t aqa;       /* 0x24: Admin Queue Attributes */
    uint64_t asq;       /* 0x28: Admin SQ Base Address */
    uint64_t acq;       /* 0x30: Admin CQ Base Address */
} nvme_regs_t;

Queue Model

         CPU                              NVMe Controller
         ----                             ----------------
  SQ Tail Doorbell  ──────────>  reads SQE from Submission Queue
                                          |
                                    processes command
                                          |
  polls CQ phase tag  <──────────  writes CQE to Completion Queue
  CQ Head Doorbell    ──────────>  controller advances CQ

Doorbell addressing: 0x1000 + (2 * qid + is_head) * (4 << DSTRD)

  • Admin queue: QID=0, depth=64 entries
  • I/O queue: QID=1, depth=64 entries

Submission Queue Entry (64 bytes)

typedef struct __attribute__((packed)) {
    uint32_t cdw0;   /* opcode[7:0] | cid[31:16] */
    uint32_t nsid;
    uint64_t reserved;
    uint64_t mptr;   /* Metadata Pointer */
    uint64_t prp1;   /* Physical Region Page 1 */
    uint64_t prp2;   /* Physical Region Page 2 */
    uint32_t cdw10;
    uint32_t cdw11;
    uint32_t cdw12;  /* NLB (0-based count) for read/write */
    uint32_t cdw13, cdw14, cdw15;
} nvme_sqe_t;

Completion Polling

The poll_cqe() function busy-polls the completion queue checking the phase tag bit:

while (timeout--) {
    entry = &cq[*cq_head];
    if ((entry->status & 1u) == *phase) {
        /* CID match check, advance head, flip phase on wrap */
        return (status_code == 0) ? 0 : -1;
    }
}

Phase tags alternate between 0 and 1 on each full traversal of the ring, allowing the driver to distinguish new completions from stale entries without clearing the ring.

I/O Operations

Read and write use a shared bounce buffer (1 page) under nvme_lock:

static int nvme_blkdev_read(struct blkdev *dev, uint64_t lba,
                            uint32_t count, void *buf)
  • Maximum transfer: 4096 bytes (8 sectors at 512B each) per operation
  • Opcode 0x02 (Read) or 0x01 (Write)
  • sfence (arch_wmb()) between SQE write and doorbell ring

xHCI USB Host Controller (xhci.c)

Overview

The xHCI driver implements USB device enumeration and HID boot-protocol keyboard/mouse support. It is polling-based (100 Hz via PIT), with no MSI/MSI-X.

Device Detection

const pcie_device_t *dev = pcie_find_device(0x0C, 0x03, 0x30);
/* class=0x0C (serial bus), subclass=0x03 (USB), progif=0x30 (xHCI) */

Initialization Sequence

1. Locate xHCI via pcie_find_device(0x0C, 0x03, 0x30)
2. Map BAR0 MMIO (256 pages = 1 MiB, uncached)
3. BIOS handoff: claim ownership via USBLEGSUP capability
4. Stop controller (clear USBCMD.RS), wait HCH=1
5. Reset controller (HCRST), wait CNR=0
6. Allocate DCBAA (Device Context Base Address Array)
7. Allocate Command Ring (64 TRBs) and Event Ring (64 TRBs)
8. Configure MaxSlotsEn, DCBAAP, Command Ring pointer
9. Set up Event Ring Segment Table (ERST)
10. Start controller (USBCMD.RS = 1)
11. Enumerate ports: reset, Enable Slot, Address Device, Configure EP
12. For each HID device: schedule first interrupt IN transfer

Register Spaces

The xHCI register layout spans several regions within BAR0:

Region Offset Size Purpose
Capability Registers BAR0 + 0x00 CAPLENGTH Version, structural params, offsets
Operational Registers BAR0 + CAPLENGTH variable USBCMD, USBSTS, DCBAAP, CRCR
Port Registers BAR0 + CAPLENGTH + 0x400 16B per port PORTSC, PORTPMSC
Runtime Registers BAR0 + RTSOFF variable Interrupter management, ERDP
Doorbell Registers BAR0 + DBOFF 4B per slot Command/transfer ring kick

BIOS Handoff

Before initializing, the driver claims ownership from the BIOS via the USB Legacy Support (USBLEGSUP) extended capability:

  1. Walk extended capability list starting at HCCPARAMS1[31:16] << 2
  2. Find capability with ID=1 (USBLEGSUP)
  3. Set OS_OWNED bit (bit 24)
  4. Wait for BIOS_OWNED bit (bit 16) to clear
  5. Disable SMI generation via USBLEGCTLSTS

TRB (Transfer Request Block)

All xHCI rings use 16-byte TRB entries:

typedef struct __attribute__((packed)) {
    uint64_t param;    /* DMA address or inline data */
    uint32_t status;   /* Transfer length, interrupter target */
    uint32_t control;  /* Cycle bit [0], Type [15:10], flags */
} xhci_trb_t;

Key TRB types:

Type Value Purpose
Normal 1 Data transfer (interrupt IN)
Setup 2 USB control transfer setup stage
Data 3 USB control transfer data stage
Status 4 USB control transfer status stage
Link 6 Ring wrap-around
Enable Slot 9 Allocate a device slot
Address Device 11 Assign USB address
Configure EP 12 Configure endpoints
Transfer Event 32 Transfer completion event
Command Completion 33 Command ring completion
Port Status Change 34 Port connect/disconnect

Device Enumeration

For each port with a connected device:

  1. Port reset: Set PORTSC.PR, wait for PRC (Port Reset Change)
  2. Enable Slot: Submit Enable Slot command, get slot ID from completion
  3. Address Device: Build input context with slot and EP0 contexts, submit Address Device
  4. GET_DESCRIPTOR: USB control transfer to read device/config/HID descriptors
  5. SET_CONFIGURATION: Activate the first configuration
  6. Configure Endpoint: Submit Configure Endpoint with interrupt IN endpoint
  7. Schedule interrupt IN: Start polling for HID reports

HID Polling

xhci_poll() is called from the PIT ISR at 100 Hz. It drains the event ring looking for Transfer Event completions. For each completed interrupt IN transfer on a HID endpoint:

  • Keyboard reports (8 bytes) are dispatched to usb_hid_process_report()
  • Mouse reports (3+ bytes) are dispatched to usb_mouse_process_report()
  • A new interrupt IN transfer is immediately rescheduled

Network Drivers

virtio-net (virtio_net.c)

Overview

The virtio-net driver implements the virtio 1.0 modern transport (PCI capability-list MMIO). It is used with QEMU’s default virtual NIC.

Device Detection

/* vendor 0x1AF4, device 0x1041 (modern) or 0x1000 (legacy-transitional) */
if (d->vendor_id == 0x1AF4 &&
    (d->device_id == 0x1041 || d->device_id == 0x1000))

PCI Capability Walk

The driver walks the PCI capability list to locate three virtio-specific structures:

Sub-type Name Purpose
1 COMMON_CFG Feature negotiation, queue setup, device status
2 NOTIFY_CFG Doorbell MMIO base + per-queue multiplier
4 DEVICE_CFG Device-specific config (MAC address bytes 0-5)

Feature Negotiation

RESET → ACKNOWLEDGE → DRIVER → negotiate features → FEATURES_OK → DRIVER_OK

Negotiated features:

  • VIRTIO_NET_F_MAC (bit 5): Device provides MAC address
  • VIRTIO_F_VERSION_1 (bit 32): Required for modern (non-transitional) devices

Virtqueue Layout

Each virtqueue (RX=0, TX=1) has 256 entries across three DMA regions:

Component Size Purpose
Descriptor Table 4096 B (1 page) 256 x 16B descriptors
Available Ring 520 B (1 page) Driver-to-device ring
Used Ring 2056 B (1 page) Device-to-driver ring
typedef struct __attribute__((packed)) {
    uint64_t addr;    /* physical address of buffer */
    uint32_t len;     /* length in bytes */
    uint16_t flags;   /* NEXT (1), WRITE (2) */
    uint16_t next;    /* next descriptor index */
} virtq_desc_t;

RX Path

Pre-filled with 256 receive buffers (1 page each). virtio_net_poll() drains the used ring:

  1. Read (id, len) from used ring entry
  2. Skip 12-byte virtio_net_hdr (modern header always 12 bytes, not 10)
  3. Call netdev_rx_deliver() with the Ethernet frame
  4. Return descriptor to available ring
  5. Always kick RX doorbell (critical for QEMU TCG/SLIRP compatibility)

The unconditional doorbell kick is necessary because in TCG mode, the guest vCPU and SLIRP share a thread. The MMIO write forces a VM-exit, giving SLIRP time to process pending RX frames.

TX Path

static int virtio_net_send(netdev_t *dev, const void *pkt, uint16_t len)
  1. Zero 12-byte virtio_net_hdr in bounce buffer, copy Ethernet frame after it
  2. Set descriptor: len = hdr_size + frame_size, flags = 0 (device reads)
  3. Publish to available ring, sfence, kick TX doorbell
  4. Synchronous poll for completion (up to 100,000 iterations)

RTL8169 (rtl8169.c)

Overview

Driver for the Realtek RTL8168/8169 PCIe gigabit Ethernet family. Polling-mode, single-instance. Tested via VFIO PCI passthrough.

Device Detection

/* vendor 0x10EC, device 0x8168 */
if (d->vendor_id == 0x10EC && d->device_id == 0x8168)

Initialization Sequence

1. Locate RTL8168 in PCIe device table
2. PCI: enable Memory + Bus Master in command register
3. Wake to D0 via PMCSR (Power Management capability)
4. Map BAR2 as 1 page of uncached MMIO
5. Sanity probe: read MAC0, check for 0xFFFFFFFF (dead device)
6. Soft reset: write CmdReset to ChipCmd, poll until clear
7. Read MAC from MAC0/MAC4 registers
8. PHY reset + auto-negotiation restart via MDIO
9. Allocate RX/TX descriptor rings (256 x 16B = 1 page each)
10. Allocate 256 RX + 256 TX DMA buffers (1 page each = 512 pages)
11. Pre-fill RX descriptors (DescOwn, RingEnd on last)
12. Configure registers (unlock → CPlusCmd → RxMaxSize → ring addrs → lock)
13. Clear RXDV_GATED_EN in MISC register
14. Disable IRQs, clear pending status
15. Enable RX + TX (ChipCmd), configure RxConfig/TxConfig
16. Register netdev "eth0"

Register Access

MMIO registers are accessed via inline functions:

static inline uint8_t  rd8 (uint16_t off) { return *(volatile uint8_t  *)(mmio + off); }
static inline void     wr8 (uint16_t off, uint8_t  v) { *(volatile uint8_t  *)(mmio + off) = v; }
/* similarly for rd16/wr16/rd32/wr32 */

Key registers:

Register Offset Width Purpose
MAC0/MAC4 0x00/0x04 32-bit MAC address (eFuse-latched)
ChipCmd 0x37 8-bit Reset, RX/TX enable
TxPoll 0x38 8-bit Kick TX queue
IntrMask/IntrStat 0x3C/0x3E 16-bit Interrupt control (disabled)
TxConfig/RxConfig 0x40/0x44 32-bit DMA burst, IFG, accept mask
Cfg9346 0x50 8-bit Config register lock/unlock
PHYAR 0x60 32-bit MII PHY register access
PHY_STATUS 0x6C 8-bit Link status
RxMaxSize 0xDA 16-bit Max frame size
CPlusCmd 0xE0 16-bit Checksum offload
RxLow/RxHigh 0xE4/0xE8 32-bit RX ring base address
TxLow/TxHigh 0x20/0x24 32-bit TX ring base address
MISC 0xF0 32-bit RXDV_GATED_EN at bit 19

Descriptor Format (16 bytes)

typedef struct __attribute__((packed)) {
    volatile uint32_t opts1;   /* DescOwn|RingEnd|First|Last|len */
    volatile uint32_t opts2;   /* VLAN/checksum offload (unused) */
    volatile uint64_t addr;    /* physical buffer address */
} rtl_desc_t;

Descriptor flags:

Flag Bit Meaning
DESC_OWN 31 Device owns this descriptor
DESC_RING_END 30 Last descriptor in ring (wrap)
DESC_FIRST_FRAG 29 First fragment of frame
DESC_LAST_FRAG 28 Last fragment of frame
Bits [13:0] - Frame length

PHY Management (MDIO)

PHY registers are accessed via the PHYAR register at offset 0x60:

/* Write: set bit 31 + (reg << 16) + value, poll bit 31 clear */
static void mdio_write(uint8_t reg, uint16_t value);
/* Read: set (reg << 16), poll bit 31 set, read low 16 bits */
static uint16_t mdio_read(uint8_t reg);

At init, the driver resets the PHY and restarts auto-negotiation:

mdio_write(MII_BMCR, MII_BMCR_RESET | MII_BMCR_ANEG_EN | MII_BMCR_ANEG_RST);

RX Polling

rtl8169_poll() drains the RX ring:

  1. Check opts1 & DESC_OWN – stop if device still owns the descriptor
  2. Extract frame length from opts1 & 0x3FFF
  3. Strip 4-byte FCS (RTL includes it in the reported length)
  4. Call netdev_rx_deliver(dev, buf, len - 4)
  5. Hand descriptor back with DESC_OWN | RX_BUF_SIZE
  6. sfence between descriptor write and opts1 update

TX Send

  1. Check slot availability (opts1 & DESC_OWN)
  2. Copy frame to bounce buffer
  3. Set opts1 = DESC_OWN | DESC_FIRST_FRAG | DESC_LAST_FRAG | len
  4. sfence, then kick TX via TxPoll = 0x40 (NPQ bit)

Framebuffer Driver (fb.c)

Overview

The framebuffer driver provides a linear 32-bpp text terminal using an 8x16 VGA font. It maps the hardware framebuffer (provided by GRUB/UEFI GOP) into kernel address space.

Features

  • Boot splash: fb_boot_splash() displays the Aegis logo on a dark background during boot
  • Panic screen: panic_bluescreen() renders exception details with a Terminus 20px font on a blue background
  • Heartbeat: fb_heartbeat() toggles a pixel block to indicate IRQ health
  • Compositor handoff: fb_lock_compositor() suppresses kernel text output when a userspace compositor maps the framebuffer

API

void fb_init(void);           /* Map FB, clear screen, init font */
void fb_putchar(char c);      /* Render one character */
void fb_write_string(const char *s);  /* Render NUL-terminated string */
void fb_lock_compositor(void); /* Suppress kernel text output */

int fb_get_phys_info(uint64_t *phys_out, uint32_t *width_out,
                     uint32_t *height_out, uint32_t *pitch_out);

Ramdisk Driver (ramdisk.c)

Overview

The ramdisk driver maps a GRUB multiboot2 module into kernel memory and presents it as a block device. Used for the root filesystem image and optionally an ESP (EFI System Partition) image.

Initialization

void ramdisk_init(uint64_t phys_base, uint64_t size);
void ramdisk_init2(uint64_t phys_base, uint64_t size);  /* second ramdisk */

The module is copied into freshly allocated KVA pages rather than mapped in-place, because GRUB places modules near the kernel image where they could overlap with VMM page tables.

Block Device Interface

static int ramdisk_read(blkdev_t *dev, uint64_t lba, uint32_t count, void *buf) {
    uint64_t off = lba * 512;
    uint64_t len = count * 512;
    if (off + len > s_size) return -1;
    memcpy(buf, s_base + off, len);
    return 0;
}

Registers as "ramdisk0" (root filesystem) and "ramdisk1" (ESP).

USB HID Drivers

Keyboard (usb_hid.c)

Processes 8-byte HID boot-protocol keyboard reports:

typedef struct __attribute__((packed)) {
    uint8_t modifier;   /* bitfield: Ctrl, Shift, Alt, GUI */
    uint8_t reserved;
    uint8_t keys[6];    /* HID usage IDs, 0 = no key */
} usb_hid_report_t;
  • Compares current report with previous to detect press/release transitions
  • Translates HID usage IDs to ASCII via lookup tables (hid_to_ascii[], hid_to_ascii_shift[])
  • Supports Shift, Ctrl modifiers
  • Injects keystrokes into the kernel keyboard ring buffer via kbd_usb_inject()

Mouse (usb_mouse.c)

Processes 3-byte HID boot-protocol mouse reports:

typedef struct __attribute__((packed)) {
    uint8_t  buttons;   /* bit 0=left, 1=right, 2=middle */
    int16_t  dx;        /* X delta */
    int16_t  dy;        /* Y delta */
    int16_t  scroll;    /* reserved (boot protocol) */
} mouse_event_t;

Events are stored in a 128-entry ring buffer. /dev/mouse reads from this buffer with blocking (mouse_read_blocking()) or non-blocking (mouse_poll()) semantics.

Driver Summary

Driver PCI Match BAR DMA Pages Polling Registration
NVMe 01:08:02 BAR0 (16 KB) ~7 pages Sync doorbell+poll blkdev_t "nvme0"
xHCI 0C:03:30 BAR0 (1 MiB) ~30+ pages 100 Hz event ring Internal slot table
virtio-net 1AF4:1041/1000 3 BARs via caps ~520 pages 100 Hz used ring netdev_t "eth0"
RTL8169 10EC:8168 BAR2 (4 KB) ~514 pages 100 Hz descriptor ring netdev_t "eth0"
Framebuffer N/A (UEFI GOP) N/A Mapped from GRUB N/A fb_putchar()
Ramdisk N/A (module) N/A Copied from module N/A blkdev_t "ramdisk0/1"

Security Considerations

Device drivers are the kernel’s interface to hardware and represent a significant attack surface, particularly for DMA-capable PCI devices:

  • No IOMMU: Aegis v1 does not program the IOMMU (VT-d / AMD-Vi). DMA-capable devices have unrestricted access to all physical memory. A compromised or malicious PCI device could read or write arbitrary memory, bypassing all software protections.
  • Descriptor trust: All drivers trust values returned by hardware (descriptor lengths, status fields, ring indices). A device returning a crafted RX length larger than the buffer size would cause an out-of-bounds read in netdev_rx_deliver().
  • No input validation on MMIO reads: Register values read from device MMIO are used directly in arithmetic (e.g., descriptor lengths, ring indices). Integer overflow or unexpected values could cause memory corruption.
  • Single-instance static state: Each driver uses file-scoped static variables. There is no driver isolation – a bug in one driver can corrupt kernel memory and affect all other subsystems.
  • PMM frame leaks: The BAR mapping pattern leaks the original PMM frames allocated by kva_alloc_pages(). While not a security vulnerability per se, it represents the kind of resource management shortcut typical of v1 code.

These are not hypothetical risks – they are the expected reality of C device drivers in a from-scratch OS. Production hardening would require IOMMU programming, bounds-checked DMA descriptor processing, and eventually rewriting drivers in Rust with safe hardware abstraction layers.