This note describes how to use KVM (Kernel-based Virtual Machine), and its PCI passthrough capability (where a PCI device can be assigned to a virtual machine).
PCI passthrough (the ‘-pcidevice’ option) is supported from KVM-79 onward. Current top-of-tree is KVM-84. As the KVM-79 release notes indicate, 2.6.28 kernel is required:
http://www.linux-kvm.com/content/kvm-79-released-pci-device-assignment-pci-device-hot-plug
There are three components to KVM:
The 2.6.28 kernel and kvm.ko loadable module.
The KVM loadable module (kvm-intel.ko or kvm-amd.ko).
The KVM-modified QEMU userspace binary. As some point, the KVM changes will be merged with QEMU development.
For our Fedora 10 systems, I downloaded a prebuilt 2.6.28 kernel from here:
http://koji.fedoraproject.org/koji/buildinfo?buildID=79697
Note: although the release notes say that 2.6.28 kernel is required for KVM, 2.6.29 kernel is required for MSI support (which we are interested in). See the ‘Virtualization’ section of the 2.6.29 release notes:
http://kernelnewbies.org/Linux_2_6_29
After installing the kernel, I downloaded KVM-84 from here:
http://kvm.qumranet.com/kvmwiki/Downloads
There does exist a HOWTO document on this wiki page for installing and running KVM, but I found that their instructions for creating a root filesystem for your virtual machine were incorrect. The steps I followed are described here.
After unpacking the tarball, I did the following:
user% cd kvm-84
user% ./configure –prefix=/usr/local/kvm
user% make
Setting ‘–prefix’ specifies that the qemu binaries will be installed under /usr/local/kvm/bin. You may need to install zlib, SDL, and kernel-devel on your system in order for the make to succeed.
Then, as root:
root% make install
root% modprobe kvm-intel (or kvm-amd if you are running on an AMD system).
You can verify the module is loaded:
% lsmod | grep kvm
kvm_intel 56424 0
kvm 164256 1 kvm_intel
Your system must support Intel’s VT-d or AMD’s SVM in order for KVM to work.
Now, we have to create a root filesystem. The simplest method is to use dd to create a sparse file:
% dd if=/dev/zero of=vdisk.img bs=1 seek=10G count=0
There does also exist a qemu-img tool, which can generate a QCOW format filesystem, but I did not try it; the sparse file worked fine for me.
Then, you can invoke the emulator to install an OS. Download a .iso for an OS to your system. Review the list of supported guest OSes here:
http://kvm.qumranet.com/kvmwiki/Guest_Support_Status
I used Debian Lenny on my system. Install the OS as follows:
root% /usr/local/kvm/bin/qemu-system-x86_64 -hda vdisk.img -cdrom ./debian-500-i386-netinst.iso -boot d -m 384
After completing the OS install, you can run the virtual machine as follows:
root% /usr/local/kvm/bin/qemu-system-x86_64 vdisk.img -m 384
‘-m’ specifies the amount of memory on the system. Note that you must be root in order to run qemu (due to permissions for accessing /dev/kvm and other files).
You can run QEMU from the vga console (through the ‘other’ KVM) as root, or through an ssh session with X forwarding:
% ssh -X ${REMOTE_HOST}
When I attempted to run QEMU from within a VNC session as root, I got the following error:
Invalid MIT-MAGIC-COOKIE-1 keyInvalid MIT-MAGIC-COOKIE-1 keyCould not initialize SDL – exiting
Sometimes I see X11 forwarding fail when running as root/sudo on the server (you ‘ssh -X’ to the machine as a regular user, but then switch to root to run qemu); one fix I’ve discovered is to copy ${HOME}/.Xauthority to /root. Then you can forward X windows while running as root.
In order to boot a virtual machine with a PCI device assigned, first obtain the device ID of your device (in the form “bus:device.function”). lspci will display PCI device information:
% lspci
% lspci -v (verbose mode)
% lspci -t (device IDs are displayed in tree form)
For example, if our device was assigned ID 02:00.0, we invoke qemu as follows:
root% /usr/local/kvm/bin/qemu-system-x86_64 vdisk.img -m 384 -pcidevice host=02:00.0
PCI passthrough has a limitation where the device to be passed must use a non-shared IRQ (we cannot share an IRQ between both the host and the guest), or the device must support MSI. In my system, I attempted to passthrough a Xilinx FPGA and both ports of a dual ethernet card.
% lspci
[…]
02:00.0 Ethernet controller: Intel Corporation Device 10c9 (rev 01)
02:00.1 Ethernet controller: Intel Corporation Device 10c9 (rev 01)
03:00.0 Memory controller: Xilinx Corporation Device 0007
[…]
Unfortunately, I was only able to passthrough the second ethernet port (02:00.1). ‘lspci -v’ showed us the IRQ information:
02:00.0 Ethernet controller: Intel Corporation Device 10c9 (rev 01)
Subsystem: Intel Corporation Device a03c
Flags: bus master, fast devsel, latency 0, IRQ 16
Memory at e1820000 (32-bit, non-prefetchable) [size=128K]
Memory at e1400000 (32-bit, non-prefetchable) [size=4M]
I/O ports at 3020 [size=32]
Memory at e18c4000 (32-bit, non-prefetchable) [size=16K]
Expansion ROM at e2000000 [disabled] [size=4M]
Capabilities:
Kernel driver in use: igb
Kernel modules: igb
02:00.1 Ethernet controller: Intel Corporation Device 10c9 (rev 01)
Subsystem: Intel Corporation Device a03c
Flags: fast devsel, IRQ 17
Memory at e1800000 (32-bit, non-prefetchable) [size=128K]
Memory at e1000000 (32-bit, non-prefetchable) [size=4M]
I/O ports at 3000 [size=32]
Memory at e18c0000 (32-bit, non-prefetchable) [size=16K]
Expansion ROM at e2400000 [disabled] [size=4M]
Capabilities:
Kernel modules: igb
03:00.0 Memory controller: Xilinx Corporation Device 0007
Subsystem: Xilinx Corporation Device 0007
Flags: bus master, fast devsel, latency 0, IRQ 10
Memory at e1b00000 (64-bit, non-prefetchable) [size=1K]
Capabilities:
The first ethernet port and the Xilinx fpga shared IRQ 16 along with the USB hub device and the SATA controller. The second ethernet port used IRQ 17, which was unshared with any other device. None of the devices support MSI, so only the second ethernet port was assignable to the virtual machine. On some systems, it may be possible to reassign IRQs in BIOS, but our xc5-pc* machines do not support that functionality. Sometimes changing the PCI slot of the device can also change the IRQ assignment, but again, that is not the case with our machines–both PCIe slots share the same IRQ.
When we assigned the second ethernet port to the virtual machine, we saw the following in the virtual machine:
% lspci
[…]
00:06.0 Ethernet controller: Intel Corporation Device 10c9 (rev 01)
[…]