Edgar Luque's Website

Creating an x86_64 kernel in Rust: Part 1

rust kernel x86_64

This is some kind of blog series about my journey as I learn and implement my own Rust kernel.

Any knowledge shared here may not be fully true, since I do this as a hobby and I'm not an experienced kernel developer. I'm just doing this for fun, learning about the true low level bits that make running a computer with the x86_64 architecture possible.

Sometimes I will mention some names or mnemonics, and maybe I won't explain them here, but I always try to put relevant links to more information. After all, if you want to make a kernel, you will need to do research, a lot.

I also won't be doing everything from scratch, for example I will use the x86_64 crate for some interactions with registers, and the acpi crate to parse and use the ACPI tables.

The bootloader

I'll be using limine, for some reasons:

To start, go to the template and clone it or use the template button.

This template uses limine, and has a nice makefile to build and run the kernel under QEMU, the makefile also downloads the UEFI firmware (OVMF) needed by QEMU to use UEFI.

I also recommend creating a .cargo file inside the kernel folder with the following config.toml:

[build]
target = "x86_64-unknown-none"

The x86_64-unknown-none is a baremetal target that fits perfectly to create a kernel.

If you are using vscode I also recommend to create a settings.json with the following:

{
    "rust-analyzer.cargo.target": "x86_64-unknown-none",
    "rust-analyzer.cargo.allTargets": false
}

You should look a bit around at the makefile to understand it.

I will only focus on x86_64 with UEFI, so I modified a bit the relevant runners (the one that uses cdrom and the one with hdd):

.PHONY: run-x86_64
run-x86_64: ovmf/ovmf-code-$(KARCH).fd ovmf/ovmf-vars-$(KARCH).fd $(IMAGE_NAME).iso
	qemu-system-$(KARCH) \
		-M q35 \
		-drive if=pflash,unit=0,format=raw,file=ovmf/ovmf-code-$(KARCH).fd,readonly=on \
		-drive if=pflash,unit=1,format=raw,file=ovmf/ovmf-vars-$(KARCH).fd \
		-cdrom $(IMAGE_NAME).iso \
		-device isa-debug-exit,iobase=0xf4,iosize=0x04 \
		-serial stdio \
		-no-reboot \
		$(QEMUFLAGS)

.PHONY: run-hdd-x86_64
run-hdd-x86_64: ovmf/ovmf-code-$(KARCH).fd ovmf/ovmf-vars-$(KARCH).fd $(IMAGE_NAME).hdd
	qemu-system-$(KARCH) \
		-M q35 \
		-drive if=pflash,unit=0,format=raw,file=ovmf/ovmf-code-$(KARCH).fd,readonly=on \
		-drive if=pflash,unit=1,format=raw,file=ovmf/ovmf-vars-$(KARCH).fd \
		-hda $(IMAGE_NAME).hdd \
		-device isa-debug-exit,iobase=0xf4,iosize=0x04 \
		-serial stdio \
		-no-reboot \
		$(QEMUFLAGS)

The main changes are adding a serial for debug output, and disabling automatic reboot.

Disabling reboot is handy in case the kernel triple faults.

Limine

With limine, you make "requests" to the bootloader to provide you with some information, for example to get the framebuffer, you request for it with:

#[used]
#[unsafe(link_section = ".requests")]
pub static FRAMEBUFFER_REQUEST: FramebufferRequest = FramebufferRequest::new();

This works alongside the support of a custom linker script, which comes with the template.

The linker script is needed because we need to tell the linker exactly where to put our requests in memory so that limine can find them. Without it, the linker might put our data anywhere, and limine wouldn't know where to look:

.data : {
    *(.data .data.*)

    /* Place the sections that contain the Limine requests as part of the .data */
    /* output section. */
    KEEP(*(.requests_start_marker))
    KEEP(*(.requests))
    KEEP(*(.requests_end_marker))
} :data

You should read the linker script and understand it.

See wiki.osdev.org/Linker_Scripts and docs/ld

On the script, you will also see a relevant entry:

/* We want to be placed in the topmost 2GiB of the address space, for optimisations */
/* and because that is what the Limine spec mandates. */
/* Any address in this region will do, but often 0xffffffff80000000 is chosen as */
/* that is the beginning of the region. */
. = 0xffffffff80000000;

This puts our kernel in the higher half virtual address range, which can only be accessed in ring 0 (the highest privilege level where the kernel runs), it also puts the kernel at the very top 2gb range.

The higher half approach is pretty common for kernels because it gives us some nice benefits: it protects the kernel from user programs (they can't access these addresses), and it makes memory management easier since we can map the kernel at the same virtual address in every process.

Small intro to paging

Paging's main use is to provide each process with its own "virtual address space".

Virtual memory is divided into fixed-size blocks called pages, while physical memory is divided into equally sized page frames. Each page can be individually mapped to any frame, avoiding memory fragmentation. This will be the job of our frame allocator alongside the memory mapper.

On x86_64, paging is achieved through the Memory Management Unit (MMU) using a series of tables:

The 48-bit virtual addresses need to be canonical, that is, they have to be sign extended. Also there is a gap in virtual addresses that aren't valid due to this.

Each page table has 512 entries of 8 bytes each, requiring 9 bits to address each entry.

When the CPU needs to translate a virtual address, it basically does this: takes the virtual address, splits it into chunks (9 bits for each level), and uses each chunk as an index to walk through the page tables until it finds the physical address. It's like following a path through multiple directories to find a file.

To improve performance, x86_64 caches recent translations in the Translation Lookaside Buffer (TLB), allowing translations to skip the multi-level table walk when cached. The TLB must be manually invalidated by the kernel when page tables change using the invlpg instruction

Check out specially the osdev wiki on paging to understand more.

See more:

Reorganizing

I moved most of the main.rs code to a boot.rs file, where I'll have most limine related stuff, the real entry point, which will call the main at main.rs.

I did this because it's good to separate the boot logic from the actual kernel logic. The boot stuff is pretty specific to limine and getting everything set up, while main.rs should focus on the actual kernel functionality. Makes things cleaner and easier to understand.

Also created a BootInfo struct alongside a global static BOOT_INFO to access it easily.

// boot.rs

/// Sets the base revision to the latest revision supported by the crate.
/// See specification for further info.
/// Be sure to mark all limine requests with #[used], otherwise they may be removed by the compiler.
#[used]
// The .requests section allows limine to find the requests faster and more safely.
#[unsafe(link_section = ".requests")]
pub static BASE_REVISION: BaseRevision = BaseRevision::new();

#[used]
#[unsafe(link_section = ".requests")]
pub static FRAMEBUFFER_REQUEST: FramebufferRequest = FramebufferRequest::new();

/// Define the start and end markers for Limine requests.
#[used]
#[unsafe(link_section = ".requests_start_marker")]
static _START_MARKER: RequestsStartMarker = RequestsStartMarker::new();
#[used]
#[unsafe(link_section = ".requests_end_marker")]
static _END_MARKER: RequestsEndMarker = RequestsEndMarker::new();

#[expect(unused)]
pub struct BootInfo {
    pub framebuffer: Framebuffer<'static>,
}

pub static BOOT_INFO: Once<BootInfo> = Once::new();

pub fn boot_info() -> &'static BootInfo {
    unsafe { BOOT_INFO.get().unwrap_unchecked() }
}

#[unsafe(no_mangle)]
unsafe extern "C" fn kmain() -> ! {
    // All limine requests must also be referenced in a called function, otherwise they may be
    // removed by the linker.
    assert!(BASE_REVISION.is_supported());

    let framebuffer = FRAMEBUFFER_REQUEST
        .get_response()
        .expect("need a framebuffer")
        .framebuffers()
        .next()
        .expect("need a framebuffer");

    let boot_info = BootInfo {
        framebuffer,
    };

    BOOT_INFO.call_once(|| boot_info);

    main()
}


// main.rs

use x86_64::instructions::hlt;

mod boot;

fn main() -> ! {
    loop {
        hlt();
    }
}

#[panic_handler]
fn rust_panic(info: &core::panic::PanicInfo) -> ! {
    loop {
        hlt();
    }
}

Once comes from the spin crate (it's like a no_std OnceCell), and hlt from the x86_64 crate.

The hlt instruction is better than just doing loop {} because it actually puts the CPU to sleep until an interrupt happens, which saves power and doesn't waste CPU cycles.

spin = "0.10.0"
x86_64 = "0.15.2"

What's next?

Donate using Liberapay