Rustyvisor Adventures

Hacking continues on lguest, with changes now committed for host SMP support. This is obviously a precursor to guest SMP support (coming next), but also scratches a major itch in being able to move away from a UP kernel in the development environment and get back to 21st century kernel compile times.

Adding SMP host support involved the removal of a bunch of globals in the assembly hypervisor shim (a tiny chunk of code which performs guest/host switching and interrupt para-virtualization), which is thus now even smaller and better encapsulated. The other notable change was converting the hypervisor’s pte page from global to per-cpu, another form of cleanup.

It’s nice when you can add a feature which actually simplifies and reduces the code. One of the great things about lguest is its simplicity, and a key challenge will be avoiding complexity and bloat as it matures.

These patches could do with some more testing. There was a bug in the PGE-disable code which caused a bit of head scratching for a while. The PGE feature of PPro+ i386 chips allows pages to be marked as ‘global’, and not be included in TLB flushes. This is used for marking things like the kernel text as global, so that kernel TLBs are not flushed on every context-switch, which is handy because the kernel doesn’t get out much (like kernel hackers), and always sits at PAGE_OFFSET after the end of process address space (unlike kernel hackers, who sit at desks). PGE is currently a problem with a guest running, as each involves having real Linux kernel running (in ring 1), so, the kernel address space is now shared and no longer global.

This bug, which caused all kinds of ugly oopses in the host, exhibited some odd behavior: things would work perfectly under qemu, and also if you used taskset on a real machine to bind the lguest VMM (and thus the guest) to a single CPU. It seems that qemu’s SMP model also binds processes to a single CPU (as far as I observed, at least), which meant that debugging the problem under qemu[1] wasn’t going to be much help. It was also a big clue. What was happening is that PGE was only being disabled on the current CPU, and when a guest kernel was run on a different CPU, it would collide with global TLB entries for the host kernel previously running on that CPU. Ahh, fun!

Btw, for anyone who wants to help out, there’s a great overview of lguest by Jon Corbet of LWN. Between that, and Rusty’s documentation, lguest could be one of the best documented projects of its type, making it even easier to hack on.

[1] I should mention how easy it is to debug the lguest host with qemu (via Rusty):

  1. Compile kernel with CONFIG_DEBUG_INFO=y
  2. Run under $ qemu -s
  3. $ gdb vmlinux
  4. (gdb) target remote localhost:1234