Device Drivers
Overview of Aegis OS device drivers: PCI enumeration, NVMe, xHCI USB, network NICs, framebuffer, and ramdisk
Device Drivers
Aegis includes a set of polling-based device drivers for storage, networking, USB input, and display. All driver code is written in C and interacts directly with hardware via MMIO and DMA – the most security-sensitive kind of kernel code. As v1 software, these drivers have not been formally audited and should be assumed to contain bugs that could be triggered by malicious or malformed hardware responses (e.g., a compromised NIC returning crafted DMA descriptors). This is expected for any from-scratch driver stack at this maturity level. The planned C-to-Rust kernel migration will eventually cover driver code, but drivers are not the first target – the capability system (kernel/cap/src/lib.rs) is the initial Rust beachhead.
Contributions are welcome – file issues or propose changes at exec/aegis.
All drivers follow a common pattern:
- PCI enumeration via the ECAM config space scanner
- BAR MMIO mapping into kernel virtual address space (uncached)
- DMA buffer allocation via
kva_alloc_pages()+kva_page_phys() - Polling at 100 Hz from the PIT timer ISR (no MSI/MSI-X interrupts)
- Registration with a subsystem abstraction layer (
blkdev_t,netdev_t)
PCI Express Enumeration (pcie.h / pcie.c)
ECAM Config Space
PCIe configuration space is accessed via ECAM (Enhanced Configuration Access Mechanism). The MCFG ACPI table provides the MMIO base address. The scanner iterates all bus/device/function combinations:
#define PCIE_MAX_DEVICES 64
typedef struct {
uint16_t vendor_id;
uint16_t device_id;
uint8_t class_code;
uint8_t subclass;
uint8_t progif;
uint8_t bus, dev, fn;
uint64_t bar[6]; /* decoded BAR base addresses (64-bit aware) */
} pcie_device_t;
pcie_init() populates a table of up to 64 devices. Drivers locate their hardware via:
pcie_find_device(class, subclass, progif): Match by class code (pass0xFFto match any field)pcie_get_devices()/pcie_device_count(): Iterate the full device table
Config Space Accessors
uint8_t pcie_read8 (uint8_t bus, uint8_t dev, uint8_t fn, uint16_t off);
uint16_t pcie_read16(uint8_t bus, uint8_t dev, uint8_t fn, uint16_t off);
uint32_t pcie_read32(uint8_t bus, uint8_t dev, uint8_t fn, uint16_t off);
void pcie_write32(uint8_t bus, uint8_t dev, uint8_t fn, uint16_t off, uint32_t val);
These are used by drivers to read/write PCI capability lists, command registers, and BAR-related configuration.
BAR Mapping Pattern
All drivers follow the same pattern to map BAR MMIO into kernel address space:
#define MMIO_FLAGS (VMM_FLAG_WRITABLE | VMM_FLAG_WC | VMM_FLAG_UCMINUS)
static uintptr_t map_bar(uint64_t pa, uint32_t n_pages)
{
uintptr_t va = (uintptr_t)kva_alloc_pages(n_pages);
for (uint32_t i = 0; i < n_pages; i++) {
uintptr_t page_va = va + i * 4096;
vmm_unmap_page(page_va); /* remove kva default mapping */
vmm_map_page(page_va, pa + i * 4096, MMIO_FLAGS);
}
return va;
}
kva_alloc_pages() allocates PMM frames and maps them; the mapping is then overwritten with the BAR physical address using uncached MMIO flags. The original PMM frames are leaked (acceptable during init).
DMA Page Allocation
static void alloc_dma_page(uint64_t *phys_out, uintptr_t *virt_out)
{
uintptr_t va = (uintptr_t)kva_alloc_pages(1);
uint64_t pa = kva_page_phys((void *)va);
memset((void *)va, 0, 4096);
*phys_out = pa;
*virt_out = va;
}
Returns both a kernel virtual address (for CPU access) and a physical address (for DMA descriptor programming).
NVMe Storage Driver (nvme.c)
Overview
The NVMe driver implements the NVMe 1.4 admin and I/O command sets. It operates synchronously (doorbell + poll completion) with no interrupts.
Device Detection
const pcie_device_t *dev = pcie_find_device(0x01, 0x08, 0x02);
/* class=0x01 (storage), subclass=0x08 (NVM), progif=0x02 (NVMe) */
Initialization Sequence
1. Find NVMe controller via pcie_find_device(0x01, 0x08, 0x02)
2. Map BAR0 (4 pages = 16 KB) as uncached MMIO
3. Read doorbell stride from CAP[55:52]
4. Disable controller: CC.EN=0, wait CSTS.RDY=0
5. Allocate admin SQ + CQ (64 entries each, 1 page each)
6. Program AQA/ASQ/ACQ, set CC.EN=1, wait CSTS.RDY=1
7. Identify Controller (admin cmd 0x06, CNS=1)
8. Identify Namespace NSID=1 (admin cmd 0x06, CNS=0)
9. Create I/O CQ (admin cmd 0x05) + I/O SQ (admin cmd 0x01)
10. Allocate bounce buffer (1 page)
11. Register blkdev_t "nvme0"
Register Layout (BAR0)
typedef struct __attribute__((packed)) {
uint64_t cap; /* 0x00: Controller Capabilities */
uint32_t vs; /* 0x08: Version */
uint32_t intms; /* 0x0C: Interrupt Mask Set */
uint32_t intmc; /* 0x10: Interrupt Mask Clear */
uint32_t cc; /* 0x14: Controller Configuration */
uint32_t reserved0; /* 0x18 */
uint32_t csts; /* 0x1C: Controller Status */
uint32_t nssr; /* 0x20: NVM Subsystem Reset */
uint32_t aqa; /* 0x24: Admin Queue Attributes */
uint64_t asq; /* 0x28: Admin SQ Base Address */
uint64_t acq; /* 0x30: Admin CQ Base Address */
} nvme_regs_t;
Queue Model
CPU NVMe Controller
---- ----------------
SQ Tail Doorbell ──────────> reads SQE from Submission Queue
|
processes command
|
polls CQ phase tag <────────── writes CQE to Completion Queue
CQ Head Doorbell ──────────> controller advances CQ
Doorbell addressing: 0x1000 + (2 * qid + is_head) * (4 << DSTRD)
- Admin queue: QID=0, depth=64 entries
- I/O queue: QID=1, depth=64 entries
Submission Queue Entry (64 bytes)
typedef struct __attribute__((packed)) {
uint32_t cdw0; /* opcode[7:0] | cid[31:16] */
uint32_t nsid;
uint64_t reserved;
uint64_t mptr; /* Metadata Pointer */
uint64_t prp1; /* Physical Region Page 1 */
uint64_t prp2; /* Physical Region Page 2 */
uint32_t cdw10;
uint32_t cdw11;
uint32_t cdw12; /* NLB (0-based count) for read/write */
uint32_t cdw13, cdw14, cdw15;
} nvme_sqe_t;
Completion Polling
The poll_cqe() function busy-polls the completion queue checking the phase tag bit:
while (timeout--) {
entry = &cq[*cq_head];
if ((entry->status & 1u) == *phase) {
/* CID match check, advance head, flip phase on wrap */
return (status_code == 0) ? 0 : -1;
}
}
Phase tags alternate between 0 and 1 on each full traversal of the ring, allowing the driver to distinguish new completions from stale entries without clearing the ring.
I/O Operations
Read and write use a shared bounce buffer (1 page) under nvme_lock:
static int nvme_blkdev_read(struct blkdev *dev, uint64_t lba,
uint32_t count, void *buf)
- Maximum transfer: 4096 bytes (8 sectors at 512B each) per operation
- Opcode
0x02(Read) or0x01(Write) sfence(arch_wmb()) between SQE write and doorbell ring
xHCI USB Host Controller (xhci.c)
Overview
The xHCI driver implements USB device enumeration and HID boot-protocol keyboard/mouse support. It is polling-based (100 Hz via PIT), with no MSI/MSI-X.
Device Detection
const pcie_device_t *dev = pcie_find_device(0x0C, 0x03, 0x30);
/* class=0x0C (serial bus), subclass=0x03 (USB), progif=0x30 (xHCI) */
Initialization Sequence
1. Locate xHCI via pcie_find_device(0x0C, 0x03, 0x30)
2. Map BAR0 MMIO (256 pages = 1 MiB, uncached)
3. BIOS handoff: claim ownership via USBLEGSUP capability
4. Stop controller (clear USBCMD.RS), wait HCH=1
5. Reset controller (HCRST), wait CNR=0
6. Allocate DCBAA (Device Context Base Address Array)
7. Allocate Command Ring (64 TRBs) and Event Ring (64 TRBs)
8. Configure MaxSlotsEn, DCBAAP, Command Ring pointer
9. Set up Event Ring Segment Table (ERST)
10. Start controller (USBCMD.RS = 1)
11. Enumerate ports: reset, Enable Slot, Address Device, Configure EP
12. For each HID device: schedule first interrupt IN transfer
Register Spaces
The xHCI register layout spans several regions within BAR0:
| Region | Offset | Size | Purpose |
|---|---|---|---|
| Capability Registers | BAR0 + 0x00 | CAPLENGTH | Version, structural params, offsets |
| Operational Registers | BAR0 + CAPLENGTH | variable | USBCMD, USBSTS, DCBAAP, CRCR |
| Port Registers | BAR0 + CAPLENGTH + 0x400 | 16B per port | PORTSC, PORTPMSC |
| Runtime Registers | BAR0 + RTSOFF | variable | Interrupter management, ERDP |
| Doorbell Registers | BAR0 + DBOFF | 4B per slot | Command/transfer ring kick |
BIOS Handoff
Before initializing, the driver claims ownership from the BIOS via the USB Legacy Support (USBLEGSUP) extended capability:
- Walk extended capability list starting at
HCCPARAMS1[31:16] << 2 - Find capability with ID=1 (USBLEGSUP)
- Set
OS_OWNEDbit (bit 24) - Wait for
BIOS_OWNEDbit (bit 16) to clear - Disable SMI generation via USBLEGCTLSTS
TRB (Transfer Request Block)
All xHCI rings use 16-byte TRB entries:
typedef struct __attribute__((packed)) {
uint64_t param; /* DMA address or inline data */
uint32_t status; /* Transfer length, interrupter target */
uint32_t control; /* Cycle bit [0], Type [15:10], flags */
} xhci_trb_t;
Key TRB types:
| Type | Value | Purpose |
|---|---|---|
| Normal | 1 | Data transfer (interrupt IN) |
| Setup | 2 | USB control transfer setup stage |
| Data | 3 | USB control transfer data stage |
| Status | 4 | USB control transfer status stage |
| Link | 6 | Ring wrap-around |
| Enable Slot | 9 | Allocate a device slot |
| Address Device | 11 | Assign USB address |
| Configure EP | 12 | Configure endpoints |
| Transfer Event | 32 | Transfer completion event |
| Command Completion | 33 | Command ring completion |
| Port Status Change | 34 | Port connect/disconnect |
Device Enumeration
For each port with a connected device:
- Port reset: Set PORTSC.PR, wait for PRC (Port Reset Change)
- Enable Slot: Submit Enable Slot command, get slot ID from completion
- Address Device: Build input context with slot and EP0 contexts, submit Address Device
- GET_DESCRIPTOR: USB control transfer to read device/config/HID descriptors
- SET_CONFIGURATION: Activate the first configuration
- Configure Endpoint: Submit Configure Endpoint with interrupt IN endpoint
- Schedule interrupt IN: Start polling for HID reports
HID Polling
xhci_poll() is called from the PIT ISR at 100 Hz. It drains the event ring looking for Transfer Event completions. For each completed interrupt IN transfer on a HID endpoint:
- Keyboard reports (8 bytes) are dispatched to
usb_hid_process_report() - Mouse reports (3+ bytes) are dispatched to
usb_mouse_process_report() - A new interrupt IN transfer is immediately rescheduled
Network Drivers
virtio-net (virtio_net.c)
Overview
The virtio-net driver implements the virtio 1.0 modern transport (PCI capability-list MMIO). It is used with QEMU’s default virtual NIC.
Device Detection
/* vendor 0x1AF4, device 0x1041 (modern) or 0x1000 (legacy-transitional) */
if (d->vendor_id == 0x1AF4 &&
(d->device_id == 0x1041 || d->device_id == 0x1000))
PCI Capability Walk
The driver walks the PCI capability list to locate three virtio-specific structures:
| Sub-type | Name | Purpose |
|---|---|---|
| 1 | COMMON_CFG | Feature negotiation, queue setup, device status |
| 2 | NOTIFY_CFG | Doorbell MMIO base + per-queue multiplier |
| 4 | DEVICE_CFG | Device-specific config (MAC address bytes 0-5) |
Feature Negotiation
RESET → ACKNOWLEDGE → DRIVER → negotiate features → FEATURES_OK → DRIVER_OK
Negotiated features:
VIRTIO_NET_F_MAC(bit 5): Device provides MAC addressVIRTIO_F_VERSION_1(bit 32): Required for modern (non-transitional) devices
Virtqueue Layout
Each virtqueue (RX=0, TX=1) has 256 entries across three DMA regions:
| Component | Size | Purpose |
|---|---|---|
| Descriptor Table | 4096 B (1 page) | 256 x 16B descriptors |
| Available Ring | 520 B (1 page) | Driver-to-device ring |
| Used Ring | 2056 B (1 page) | Device-to-driver ring |
typedef struct __attribute__((packed)) {
uint64_t addr; /* physical address of buffer */
uint32_t len; /* length in bytes */
uint16_t flags; /* NEXT (1), WRITE (2) */
uint16_t next; /* next descriptor index */
} virtq_desc_t;
RX Path
Pre-filled with 256 receive buffers (1 page each). virtio_net_poll() drains the used ring:
- Read
(id, len)from used ring entry - Skip 12-byte
virtio_net_hdr(modern header always 12 bytes, not 10) - Call
netdev_rx_deliver()with the Ethernet frame - Return descriptor to available ring
- Always kick RX doorbell (critical for QEMU TCG/SLIRP compatibility)
The unconditional doorbell kick is necessary because in TCG mode, the guest vCPU and SLIRP share a thread. The MMIO write forces a VM-exit, giving SLIRP time to process pending RX frames.
TX Path
static int virtio_net_send(netdev_t *dev, const void *pkt, uint16_t len)
- Zero 12-byte virtio_net_hdr in bounce buffer, copy Ethernet frame after it
- Set descriptor:
len = hdr_size + frame_size,flags = 0(device reads) - Publish to available ring,
sfence, kick TX doorbell - Synchronous poll for completion (up to 100,000 iterations)
RTL8169 (rtl8169.c)
Overview
Driver for the Realtek RTL8168/8169 PCIe gigabit Ethernet family. Polling-mode, single-instance. Tested via VFIO PCI passthrough.
Device Detection
/* vendor 0x10EC, device 0x8168 */
if (d->vendor_id == 0x10EC && d->device_id == 0x8168)
Initialization Sequence
1. Locate RTL8168 in PCIe device table
2. PCI: enable Memory + Bus Master in command register
3. Wake to D0 via PMCSR (Power Management capability)
4. Map BAR2 as 1 page of uncached MMIO
5. Sanity probe: read MAC0, check for 0xFFFFFFFF (dead device)
6. Soft reset: write CmdReset to ChipCmd, poll until clear
7. Read MAC from MAC0/MAC4 registers
8. PHY reset + auto-negotiation restart via MDIO
9. Allocate RX/TX descriptor rings (256 x 16B = 1 page each)
10. Allocate 256 RX + 256 TX DMA buffers (1 page each = 512 pages)
11. Pre-fill RX descriptors (DescOwn, RingEnd on last)
12. Configure registers (unlock → CPlusCmd → RxMaxSize → ring addrs → lock)
13. Clear RXDV_GATED_EN in MISC register
14. Disable IRQs, clear pending status
15. Enable RX + TX (ChipCmd), configure RxConfig/TxConfig
16. Register netdev "eth0"
Register Access
MMIO registers are accessed via inline functions:
static inline uint8_t rd8 (uint16_t off) { return *(volatile uint8_t *)(mmio + off); }
static inline void wr8 (uint16_t off, uint8_t v) { *(volatile uint8_t *)(mmio + off) = v; }
/* similarly for rd16/wr16/rd32/wr32 */
Key registers:
| Register | Offset | Width | Purpose |
|---|---|---|---|
| MAC0/MAC4 | 0x00/0x04 | 32-bit | MAC address (eFuse-latched) |
| ChipCmd | 0x37 | 8-bit | Reset, RX/TX enable |
| TxPoll | 0x38 | 8-bit | Kick TX queue |
| IntrMask/IntrStat | 0x3C/0x3E | 16-bit | Interrupt control (disabled) |
| TxConfig/RxConfig | 0x40/0x44 | 32-bit | DMA burst, IFG, accept mask |
| Cfg9346 | 0x50 | 8-bit | Config register lock/unlock |
| PHYAR | 0x60 | 32-bit | MII PHY register access |
| PHY_STATUS | 0x6C | 8-bit | Link status |
| RxMaxSize | 0xDA | 16-bit | Max frame size |
| CPlusCmd | 0xE0 | 16-bit | Checksum offload |
| RxLow/RxHigh | 0xE4/0xE8 | 32-bit | RX ring base address |
| TxLow/TxHigh | 0x20/0x24 | 32-bit | TX ring base address |
| MISC | 0xF0 | 32-bit | RXDV_GATED_EN at bit 19 |
Descriptor Format (16 bytes)
typedef struct __attribute__((packed)) {
volatile uint32_t opts1; /* DescOwn|RingEnd|First|Last|len */
volatile uint32_t opts2; /* VLAN/checksum offload (unused) */
volatile uint64_t addr; /* physical buffer address */
} rtl_desc_t;
Descriptor flags:
| Flag | Bit | Meaning |
|---|---|---|
| DESC_OWN | 31 | Device owns this descriptor |
| DESC_RING_END | 30 | Last descriptor in ring (wrap) |
| DESC_FIRST_FRAG | 29 | First fragment of frame |
| DESC_LAST_FRAG | 28 | Last fragment of frame |
| Bits [13:0] | - | Frame length |
PHY Management (MDIO)
PHY registers are accessed via the PHYAR register at offset 0x60:
/* Write: set bit 31 + (reg << 16) + value, poll bit 31 clear */
static void mdio_write(uint8_t reg, uint16_t value);
/* Read: set (reg << 16), poll bit 31 set, read low 16 bits */
static uint16_t mdio_read(uint8_t reg);
At init, the driver resets the PHY and restarts auto-negotiation:
mdio_write(MII_BMCR, MII_BMCR_RESET | MII_BMCR_ANEG_EN | MII_BMCR_ANEG_RST);
RX Polling
rtl8169_poll() drains the RX ring:
- Check
opts1 & DESC_OWN– stop if device still owns the descriptor - Extract frame length from
opts1 & 0x3FFF - Strip 4-byte FCS (RTL includes it in the reported length)
- Call
netdev_rx_deliver(dev, buf, len - 4) - Hand descriptor back with
DESC_OWN | RX_BUF_SIZE sfencebetween descriptor write andopts1update
TX Send
- Check slot availability (
opts1 & DESC_OWN) - Copy frame to bounce buffer
- Set
opts1 = DESC_OWN | DESC_FIRST_FRAG | DESC_LAST_FRAG | len sfence, then kick TX viaTxPoll = 0x40(NPQ bit)
Framebuffer Driver (fb.c)
Overview
The framebuffer driver provides a linear 32-bpp text terminal using an 8x16 VGA font. It maps the hardware framebuffer (provided by GRUB/UEFI GOP) into kernel address space.
Features
- Boot splash:
fb_boot_splash()displays the Aegis logo on a dark background during boot - Panic screen:
panic_bluescreen()renders exception details with a Terminus 20px font on a blue background - Heartbeat:
fb_heartbeat()toggles a pixel block to indicate IRQ health - Compositor handoff:
fb_lock_compositor()suppresses kernel text output when a userspace compositor maps the framebuffer
API
void fb_init(void); /* Map FB, clear screen, init font */
void fb_putchar(char c); /* Render one character */
void fb_write_string(const char *s); /* Render NUL-terminated string */
void fb_lock_compositor(void); /* Suppress kernel text output */
int fb_get_phys_info(uint64_t *phys_out, uint32_t *width_out,
uint32_t *height_out, uint32_t *pitch_out);
Ramdisk Driver (ramdisk.c)
Overview
The ramdisk driver maps a GRUB multiboot2 module into kernel memory and presents it as a block device. Used for the root filesystem image and optionally an ESP (EFI System Partition) image.
Initialization
void ramdisk_init(uint64_t phys_base, uint64_t size);
void ramdisk_init2(uint64_t phys_base, uint64_t size); /* second ramdisk */
The module is copied into freshly allocated KVA pages rather than mapped in-place, because GRUB places modules near the kernel image where they could overlap with VMM page tables.
Block Device Interface
static int ramdisk_read(blkdev_t *dev, uint64_t lba, uint32_t count, void *buf) {
uint64_t off = lba * 512;
uint64_t len = count * 512;
if (off + len > s_size) return -1;
memcpy(buf, s_base + off, len);
return 0;
}
Registers as "ramdisk0" (root filesystem) and "ramdisk1" (ESP).
USB HID Drivers
Keyboard (usb_hid.c)
Processes 8-byte HID boot-protocol keyboard reports:
typedef struct __attribute__((packed)) {
uint8_t modifier; /* bitfield: Ctrl, Shift, Alt, GUI */
uint8_t reserved;
uint8_t keys[6]; /* HID usage IDs, 0 = no key */
} usb_hid_report_t;
- Compares current report with previous to detect press/release transitions
- Translates HID usage IDs to ASCII via lookup tables (
hid_to_ascii[],hid_to_ascii_shift[]) - Supports Shift, Ctrl modifiers
- Injects keystrokes into the kernel keyboard ring buffer via
kbd_usb_inject()
Mouse (usb_mouse.c)
Processes 3-byte HID boot-protocol mouse reports:
typedef struct __attribute__((packed)) {
uint8_t buttons; /* bit 0=left, 1=right, 2=middle */
int16_t dx; /* X delta */
int16_t dy; /* Y delta */
int16_t scroll; /* reserved (boot protocol) */
} mouse_event_t;
Events are stored in a 128-entry ring buffer. /dev/mouse reads from this buffer with blocking (mouse_read_blocking()) or non-blocking (mouse_poll()) semantics.
Driver Summary
| Driver | PCI Match | BAR | DMA Pages | Polling | Registration |
|---|---|---|---|---|---|
| NVMe | 01:08:02 | BAR0 (16 KB) | ~7 pages | Sync doorbell+poll | blkdev_t "nvme0" |
| xHCI | 0C:03:30 | BAR0 (1 MiB) | ~30+ pages | 100 Hz event ring | Internal slot table |
| virtio-net | 1AF4:1041/1000 | 3 BARs via caps | ~520 pages | 100 Hz used ring | netdev_t "eth0" |
| RTL8169 | 10EC:8168 | BAR2 (4 KB) | ~514 pages | 100 Hz descriptor ring | netdev_t "eth0" |
| Framebuffer | N/A (UEFI GOP) | N/A | Mapped from GRUB | N/A | fb_putchar() |
| Ramdisk | N/A (module) | N/A | Copied from module | N/A | blkdev_t "ramdisk0/1" |
Security Considerations
Device drivers are the kernel’s interface to hardware and represent a significant attack surface, particularly for DMA-capable PCI devices:
- No IOMMU: Aegis v1 does not program the IOMMU (VT-d / AMD-Vi). DMA-capable devices have unrestricted access to all physical memory. A compromised or malicious PCI device could read or write arbitrary memory, bypassing all software protections.
- Descriptor trust: All drivers trust values returned by hardware (descriptor lengths, status fields, ring indices). A device returning a crafted RX length larger than the buffer size would cause an out-of-bounds read in
netdev_rx_deliver(). - No input validation on MMIO reads: Register values read from device MMIO are used directly in arithmetic (e.g., descriptor lengths, ring indices). Integer overflow or unexpected values could cause memory corruption.
- Single-instance static state: Each driver uses file-scoped static variables. There is no driver isolation – a bug in one driver can corrupt kernel memory and affect all other subsystems.
- PMM frame leaks: The BAR mapping pattern leaks the original PMM frames allocated by
kva_alloc_pages(). While not a security vulnerability per se, it represents the kind of resource management shortcut typical of v1 code.
These are not hypothetical risks – they are the expected reality of C device drivers in a from-scratch OS. Production hardening would require IOMMU programming, bounds-checked DMA descriptor processing, and eventually rewriting drivers in Rust with safe hardware abstraction layers.
Related Documentation
- Network Stack Overview – how NIC drivers integrate with the protocol stack
- Boot Process – driver initialization order during boot
- Memory Management – KVA allocator and VMM page mapping