Skip to content

Commit

Permalink
KVM: ppc: PowerPC 440 KVM implementation
Browse files Browse the repository at this point in the history
This functionality is definitely experimental, but is capable of running
unmodified PowerPC 440 Linux kernels as guests on a PowerPC 440 host. (Only
tested with 440EP "Bamboo" guests so far, but with appropriate userspace
support other SoC/board combinations should work.)

See Documentation/powerpc/kvm_440.txt for technical details.

[stephen: build fix]

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
  • Loading branch information
Hollis Blanchard authored and Avi Kivity committed Apr 27, 2008
1 parent 513014b commit bbf45ba
Show file tree
Hide file tree
Showing 19 changed files with 3,159 additions and 2 deletions.
41 changes: 41 additions & 0 deletions Documentation/powerpc/kvm_440.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
Hollis Blanchard <hollisb@us.ibm.com>
15 Apr 2008

Various notes on the implementation of KVM for PowerPC 440:

To enforce isolation, host userspace, guest kernel, and guest userspace all
run at user privilege level. Only the host kernel runs in supervisor mode.
Executing privileged instructions in the guest traps into KVM (in the host
kernel), where we decode and emulate them. Through this technique, unmodified
440 Linux kernels can be run (slowly) as guests. Future performance work will
focus on reducing the overhead and frequency of these traps.

The usual code flow is started from userspace invoking an "run" ioctl, which
causes KVM to switch into guest context. We use IVPR to hijack the host
interrupt vectors while running the guest, which allows us to direct all
interrupts to kvmppc_handle_interrupt(). At this point, we could either
- handle the interrupt completely (e.g. emulate "mtspr SPRG0"), or
- let the host interrupt handler run (e.g. when the decrementer fires), or
- return to host userspace (e.g. when the guest performs device MMIO)

Address spaces: We take advantage of the fact that Linux doesn't use the AS=1
address space (in host or guest), which gives us virtual address space to use
for guest mappings. While the guest is running, the host kernel remains mapped
in AS=0, but the guest can only use AS=1 mappings.

TLB entries: The TLB entries covering the host linear mapping remain
present while running the guest. This reduces the overhead of lightweight
exits, which are handled by KVM running in the host kernel. We keep three
copies of the TLB:
- guest TLB: contents of the TLB as the guest sees it
- shadow TLB: the TLB that is actually in hardware while guest is running
- host TLB: to restore TLB state when context switching guest -> host
When a TLB miss occurs because a mapping was not present in the shadow TLB,
but was present in the guest TLB, KVM handles the fault without invoking the
guest. Large guest pages are backed by multiple 4KB shadow pages through this
mechanism.

IO: MMIO and DCR accesses are emulated by userspace. We use virtio for network
and block IO, so those drivers must be enabled in the guest. It's possible
that some qemu device emulation (e.g. e1000 or rtl8139) may also work with
little effort.
1 change: 1 addition & 0 deletions arch/powerpc/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -803,3 +803,4 @@ config PPC_CLOCK
config PPC_LIB_RHEAP
bool

source "arch/powerpc/kvm/Kconfig"
3 changes: 3 additions & 0 deletions arch/powerpc/Kconfig.debug
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,9 @@ config BOOTX_TEXT

config PPC_EARLY_DEBUG
bool "Early debugging (dangerous)"
# PPC_EARLY_DEBUG on 440 leaves AS=1 mappings above the TLB high water
# mark, which doesn't work with current 440 KVM.
depends on !KVM
help
Say Y to enable some early debugging facilities that may be available
for your processor/board combination. Those facilities are hacks
Expand Down
1 change: 1 addition & 0 deletions arch/powerpc/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ core-y += arch/powerpc/kernel/ \
arch/powerpc/platforms/
core-$(CONFIG_MATH_EMULATION) += arch/powerpc/math-emu/
core-$(CONFIG_XMON) += arch/powerpc/xmon/
core-$(CONFIG_KVM) += arch/powerpc/kvm/

drivers-$(CONFIG_OPROFILE) += arch/powerpc/oprofile/

Expand Down
28 changes: 28 additions & 0 deletions arch/powerpc/kernel/asm-offsets.c
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@
#include <linux/mm.h>
#include <linux/suspend.h>
#include <linux/hrtimer.h>
#ifdef CONFIG_KVM
#include <linux/kvm_host.h>
#endif
#ifdef CONFIG_PPC64
#include <linux/time.h>
#include <linux/hardirq.h>
Expand Down Expand Up @@ -324,5 +327,30 @@ int main(void)

DEFINE(PGD_TABLE_SIZE, PGD_TABLE_SIZE);

#ifdef CONFIG_KVM
DEFINE(TLBE_BYTES, sizeof(struct tlbe));

DEFINE(VCPU_HOST_STACK, offsetof(struct kvm_vcpu, arch.host_stack));
DEFINE(VCPU_HOST_PID, offsetof(struct kvm_vcpu, arch.host_pid));
DEFINE(VCPU_HOST_TLB, offsetof(struct kvm_vcpu, arch.host_tlb));
DEFINE(VCPU_SHADOW_TLB, offsetof(struct kvm_vcpu, arch.shadow_tlb));
DEFINE(VCPU_GPRS, offsetof(struct kvm_vcpu, arch.gpr));
DEFINE(VCPU_LR, offsetof(struct kvm_vcpu, arch.lr));
DEFINE(VCPU_CR, offsetof(struct kvm_vcpu, arch.cr));
DEFINE(VCPU_XER, offsetof(struct kvm_vcpu, arch.xer));
DEFINE(VCPU_CTR, offsetof(struct kvm_vcpu, arch.ctr));
DEFINE(VCPU_PC, offsetof(struct kvm_vcpu, arch.pc));
DEFINE(VCPU_MSR, offsetof(struct kvm_vcpu, arch.msr));
DEFINE(VCPU_SPRG4, offsetof(struct kvm_vcpu, arch.sprg4));
DEFINE(VCPU_SPRG5, offsetof(struct kvm_vcpu, arch.sprg5));
DEFINE(VCPU_SPRG6, offsetof(struct kvm_vcpu, arch.sprg6));
DEFINE(VCPU_SPRG7, offsetof(struct kvm_vcpu, arch.sprg7));
DEFINE(VCPU_PID, offsetof(struct kvm_vcpu, arch.pid));

DEFINE(VCPU_LAST_INST, offsetof(struct kvm_vcpu, arch.last_inst));
DEFINE(VCPU_FAULT_DEAR, offsetof(struct kvm_vcpu, arch.fault_dear));
DEFINE(VCPU_FAULT_ESR, offsetof(struct kvm_vcpu, arch.fault_esr));
#endif

return 0;
}
224 changes: 224 additions & 0 deletions arch/powerpc/kvm/44x_tlb.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License, version 2, as
* published by the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
*
* Copyright IBM Corp. 2007
*
* Authors: Hollis Blanchard <hollisb@us.ibm.com>
*/

#include <linux/types.h>
#include <linux/string.h>
#include <linux/kvm_host.h>
#include <linux/highmem.h>
#include <asm/mmu-44x.h>
#include <asm/kvm_ppc.h>

#include "44x_tlb.h"

#define PPC44x_TLB_USER_PERM_MASK (PPC44x_TLB_UX|PPC44x_TLB_UR|PPC44x_TLB_UW)
#define PPC44x_TLB_SUPER_PERM_MASK (PPC44x_TLB_SX|PPC44x_TLB_SR|PPC44x_TLB_SW)

static unsigned int kvmppc_tlb_44x_pos;

static u32 kvmppc_44x_tlb_shadow_attrib(u32 attrib, int usermode)
{
/* Mask off reserved bits. */
attrib &= PPC44x_TLB_PERM_MASK|PPC44x_TLB_ATTR_MASK;

if (!usermode) {
/* Guest is in supervisor mode, so we need to translate guest
* supervisor permissions into user permissions. */
attrib &= ~PPC44x_TLB_USER_PERM_MASK;
attrib |= (attrib & PPC44x_TLB_SUPER_PERM_MASK) << 3;
}

/* Make sure host can always access this memory. */
attrib |= PPC44x_TLB_SX|PPC44x_TLB_SR|PPC44x_TLB_SW;

return attrib;
}

/* Search the guest TLB for a matching entry. */
int kvmppc_44x_tlb_index(struct kvm_vcpu *vcpu, gva_t eaddr, unsigned int pid,
unsigned int as)
{
int i;

/* XXX Replace loop with fancy data structures. */
for (i = 0; i < PPC44x_TLB_SIZE; i++) {
struct tlbe *tlbe = &vcpu->arch.guest_tlb[i];
unsigned int tid;

if (eaddr < get_tlb_eaddr(tlbe))
continue;

if (eaddr > get_tlb_end(tlbe))
continue;

tid = get_tlb_tid(tlbe);
if (tid && (tid != pid))
continue;

if (!get_tlb_v(tlbe))
continue;

if (get_tlb_ts(tlbe) != as)
continue;

return i;
}

return -1;
}

struct tlbe *kvmppc_44x_itlb_search(struct kvm_vcpu *vcpu, gva_t eaddr)
{
unsigned int as = !!(vcpu->arch.msr & MSR_IS);
unsigned int index;

index = kvmppc_44x_tlb_index(vcpu, eaddr, vcpu->arch.pid, as);
if (index == -1)
return NULL;
return &vcpu->arch.guest_tlb[index];
}

struct tlbe *kvmppc_44x_dtlb_search(struct kvm_vcpu *vcpu, gva_t eaddr)
{
unsigned int as = !!(vcpu->arch.msr & MSR_DS);
unsigned int index;

index = kvmppc_44x_tlb_index(vcpu, eaddr, vcpu->arch.pid, as);
if (index == -1)
return NULL;
return &vcpu->arch.guest_tlb[index];
}

static int kvmppc_44x_tlbe_is_writable(struct tlbe *tlbe)
{
return tlbe->word2 & (PPC44x_TLB_SW|PPC44x_TLB_UW);
}

/* Must be called with mmap_sem locked for writing. */
static void kvmppc_44x_shadow_release(struct kvm_vcpu *vcpu,
unsigned int index)
{
struct tlbe *stlbe = &vcpu->arch.shadow_tlb[index];
struct page *page = vcpu->arch.shadow_pages[index];

kunmap(vcpu->arch.shadow_pages[index]);

if (get_tlb_v(stlbe)) {
if (kvmppc_44x_tlbe_is_writable(stlbe))
kvm_release_page_dirty(page);
else
kvm_release_page_clean(page);
}
}

/* Caller must ensure that the specified guest TLB entry is safe to insert into
* the shadow TLB. */
void kvmppc_mmu_map(struct kvm_vcpu *vcpu, u64 gvaddr, gfn_t gfn, u64 asid,
u32 flags)
{
struct page *new_page;
struct tlbe *stlbe;
hpa_t hpaddr;
unsigned int victim;

/* Future optimization: don't overwrite the TLB entry containing the
* current PC (or stack?). */
victim = kvmppc_tlb_44x_pos++;
if (kvmppc_tlb_44x_pos > tlb_44x_hwater)
kvmppc_tlb_44x_pos = 0;
stlbe = &vcpu->arch.shadow_tlb[victim];

/* Get reference to new page. */
down_write(&current->mm->mmap_sem);
new_page = gfn_to_page(vcpu->kvm, gfn);
if (is_error_page(new_page)) {
printk(KERN_ERR "Couldn't get guest page!\n");
kvm_release_page_clean(new_page);
return;
}
hpaddr = page_to_phys(new_page);

/* Drop reference to old page. */
kvmppc_44x_shadow_release(vcpu, victim);
up_write(&current->mm->mmap_sem);

vcpu->arch.shadow_pages[victim] = new_page;

/* XXX Make sure (va, size) doesn't overlap any other
* entries. 440x6 user manual says the result would be
* "undefined." */

/* XXX what about AS? */

stlbe->tid = asid & 0xff;

/* Force TS=1 for all guest mappings. */
/* For now we hardcode 4KB mappings, but it will be important to
* use host large pages in the future. */
stlbe->word0 = (gvaddr & PAGE_MASK) | PPC44x_TLB_VALID | PPC44x_TLB_TS
| PPC44x_TLB_4K;

stlbe->word1 = (hpaddr & 0xfffffc00) | ((hpaddr >> 32) & 0xf);
stlbe->word2 = kvmppc_44x_tlb_shadow_attrib(flags,
vcpu->arch.msr & MSR_PR);
}

void kvmppc_mmu_invalidate(struct kvm_vcpu *vcpu, u64 eaddr, u64 asid)
{
unsigned int pid = asid & 0xff;
int i;

/* XXX Replace loop with fancy data structures. */
down_write(&current->mm->mmap_sem);
for (i = 0; i <= tlb_44x_hwater; i++) {
struct tlbe *stlbe = &vcpu->arch.shadow_tlb[i];
unsigned int tid;

if (!get_tlb_v(stlbe))
continue;

if (eaddr < get_tlb_eaddr(stlbe))
continue;

if (eaddr > get_tlb_end(stlbe))
continue;

tid = get_tlb_tid(stlbe);
if (tid && (tid != pid))
continue;

kvmppc_44x_shadow_release(vcpu, i);
stlbe->word0 = 0;
}
up_write(&current->mm->mmap_sem);
}

/* Invalidate all mappings, so that when they fault back in they will get the
* proper permission bits. */
void kvmppc_mmu_priv_switch(struct kvm_vcpu *vcpu, int usermode)
{
int i;

/* XXX Replace loop with fancy data structures. */
down_write(&current->mm->mmap_sem);
for (i = 0; i <= tlb_44x_hwater; i++) {
kvmppc_44x_shadow_release(vcpu, i);
vcpu->arch.shadow_tlb[i].word0 = 0;
}
up_write(&current->mm->mmap_sem);
}
Loading

0 comments on commit bbf45ba

Please sign in to comment.