Skip to content

Commit

Permalink
KVM: Reinstate gfn_to_pfn_cache with invalidation support
Browse files Browse the repository at this point in the history
This can be used in two modes. There is an atomic mode where the cached
mapping is accessed while holding the rwlock, and a mode where the
physical address is used by a vCPU in guest mode.

For the latter case, an invalidation will wake the vCPU with the new
KVM_REQ_GPC_INVALIDATE, and the architecture will need to refresh any
caches it still needs to access before entering guest mode again.

Only one vCPU can be targeted by the wake requests; it's simple enough
to make it wake all vCPUs or even a mask but I don't see a use case for
that additional complexity right now.

Invalidation happens from the invalidate_range_start MMU notifier, which
needs to be able to sleep in order to wake the vCPU and wait for it.

This means that revalidation potentially needs to "wait" for the MMU
operation to complete and the invalidate_range_end notifier to be
invoked. Like the vCPU when it takes a page fault in that period, we
just spin — fixing that in a future patch by implementing an actual
*wait* may be another part of shaving this particularly hirsute yak.

As noted in the comments in the function itself, the only case where
the invalidate_range_start notifier is expected to be called *without*
being able to sleep is when the OOM reaper is killing the process. In
that case, we expect the vCPU threads already to have exited, and thus
there will be nothing to wake, and no reason to wait. So we clear the
KVM_REQUEST_WAIT bit and send the request anyway, then complain loudly
if there actually *was* anything to wake up.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211210163625.2886-3-dwmw2@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
  • Loading branch information
David Woodhouse authored and Paolo Bonzini committed Jan 7, 2022
1 parent 2efd61a commit 982ed0d
Show file tree
Hide file tree
Showing 10 changed files with 517 additions and 27 deletions.
1 change: 1 addition & 0 deletions arch/x86/kvm/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ config KVM
select PREEMPT_NOTIFIERS
select MMU_NOTIFIER
select HAVE_KVM_IRQCHIP
select HAVE_KVM_PFNCACHE
select HAVE_KVM_IRQFD
select HAVE_KVM_DIRTY_RING
select IRQ_BYPASS_MANAGER
Expand Down
103 changes: 103 additions & 0 deletions include/linux/kvm_host.h
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,7 @@ static inline bool is_error_page(struct page *page)
#define KVM_REQ_UNBLOCK 2
#define KVM_REQ_UNHALT 3
#define KVM_REQ_VM_DEAD (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQ_GPC_INVALIDATE (5 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
#define KVM_REQUEST_ARCH_BASE 8

#define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \
Expand Down Expand Up @@ -593,6 +594,10 @@ struct kvm {
unsigned long mn_active_invalidate_count;
struct rcuwait mn_memslots_update_rcuwait;

/* For management / invalidation of gfn_to_pfn_caches */
spinlock_t gpc_lock;
struct list_head gpc_list;

/*
* created_vcpus is protected by kvm->lock, and is incremented
* at the beginning of KVM_CREATE_VCPU. online_vcpus is only
Expand Down Expand Up @@ -1099,6 +1104,104 @@ int kvm_vcpu_write_guest(struct kvm_vcpu *vcpu, gpa_t gpa, const void *data,
unsigned long len);
void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn);

/**
* kvm_gfn_to_pfn_cache_init - prepare a cached kernel mapping and HPA for a
* given guest physical address.
*
* @kvm: pointer to kvm instance.
* @gpc: struct gfn_to_pfn_cache object.
* @vcpu: vCPU to be used for marking pages dirty and to be woken on
* invalidation.
* @guest_uses_pa: indicates that the resulting host physical PFN is used while
* @vcpu is IN_GUEST_MODE so invalidations should wake it.
* @kernel_map: requests a kernel virtual mapping (kmap / memremap).
* @gpa: guest physical address to map.
* @len: sanity check; the range being access must fit a single page.
* @dirty: mark the cache dirty immediately.
*
* @return: 0 for success.
* -EINVAL for a mapping which would cross a page boundary.
* -EFAULT for an untranslatable guest physical address.
*
* This primes a gfn_to_pfn_cache and links it into the @kvm's list for
* invalidations to be processed. Invalidation callbacks to @vcpu using
* %KVM_REQ_GPC_INVALIDATE will occur only for MMU notifiers, not for KVM
* memslot changes. Callers are required to use kvm_gfn_to_pfn_cache_check()
* to ensure that the cache is valid before accessing the target page.
*/
int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
struct kvm_vcpu *vcpu, bool guest_uses_pa,
bool kernel_map, gpa_t gpa, unsigned long len,
bool dirty);

/**
* kvm_gfn_to_pfn_cache_check - check validity of a gfn_to_pfn_cache.
*
* @kvm: pointer to kvm instance.
* @gpc: struct gfn_to_pfn_cache object.
* @gpa: current guest physical address to map.
* @len: sanity check; the range being access must fit a single page.
* @dirty: mark the cache dirty immediately.
*
* @return: %true if the cache is still valid and the address matches.
* %false if the cache is not valid.
*
* Callers outside IN_GUEST_MODE context should hold a read lock on @gpc->lock
* while calling this function, and then continue to hold the lock until the
* access is complete.
*
* Callers in IN_GUEST_MODE may do so without locking, although they should
* still hold a read lock on kvm->scru for the memslot checks.
*/
bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
gpa_t gpa, unsigned long len);

/**
* kvm_gfn_to_pfn_cache_refresh - update a previously initialized cache.
*
* @kvm: pointer to kvm instance.
* @gpc: struct gfn_to_pfn_cache object.
* @gpa: updated guest physical address to map.
* @len: sanity check; the range being access must fit a single page.
* @dirty: mark the cache dirty immediately.
*
* @return: 0 for success.
* -EINVAL for a mapping which would cross a page boundary.
* -EFAULT for an untranslatable guest physical address.
*
* This will attempt to refresh a gfn_to_pfn_cache. Note that a successful
* returm from this function does not mean the page can be immediately
* accessed because it may have raced with an invalidation. Callers must
* still lock and check the cache status, as this function does not return
* with the lock still held to permit access.
*/
int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc,
gpa_t gpa, unsigned long len, bool dirty);

/**
* kvm_gfn_to_pfn_cache_unmap - temporarily unmap a gfn_to_pfn_cache.
*
* @kvm: pointer to kvm instance.
* @gpc: struct gfn_to_pfn_cache object.
*
* This unmaps the referenced page and marks it dirty, if appropriate. The
* cache is left in the invalid state but at least the mapping from GPA to
* userspace HVA will remain cached and can be reused on a subsequent
* refresh.
*/
void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc);

/**
* kvm_gfn_to_pfn_cache_destroy - destroy and unlink a gfn_to_pfn_cache.
*
* @kvm: pointer to kvm instance.
* @gpc: struct gfn_to_pfn_cache object.
*
* This removes a cache from the @kvm's list to be processed on MMU notifier
* invocation.
*/
void kvm_gfn_to_pfn_cache_destroy(struct kvm *kvm, struct gfn_to_pfn_cache *gpc);

void kvm_sigset_activate(struct kvm_vcpu *vcpu);
void kvm_sigset_deactivate(struct kvm_vcpu *vcpu);

Expand Down
18 changes: 18 additions & 0 deletions include/linux/kvm_types.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ struct kvm_memslots;
enum kvm_mr_change;

#include <linux/types.h>
#include <linux/spinlock_types.h>

#include <asm/kvm_types.h>

Expand Down Expand Up @@ -53,6 +54,23 @@ struct gfn_to_hva_cache {
struct kvm_memory_slot *memslot;
};

struct gfn_to_pfn_cache {
u64 generation;
gpa_t gpa;
unsigned long uhva;
struct kvm_memory_slot *memslot;
struct kvm_vcpu *vcpu;
struct list_head list;
rwlock_t lock;
void *khva;
kvm_pfn_t pfn;
bool active;
bool valid;
bool dirty;
bool kernel_map;
bool guest_uses_pa;
};

#ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
/*
* Memory caches are used to preallocate memory ahead of various MMU flows,
Expand Down
3 changes: 3 additions & 0 deletions virt/kvm/Kconfig
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
config HAVE_KVM
bool

config HAVE_KVM_PFNCACHE
bool

config HAVE_KVM_IRQCHIP
bool

Expand Down
1 change: 1 addition & 0 deletions virt/kvm/Makefile.kvm
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ kvm-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
2 changes: 1 addition & 1 deletion virt/kvm/dirty_ring.c
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
#include <linux/vmalloc.h>
#include <linux/kvm_dirty_ring.h>
#include <trace/events/kvm.h>
#include "mmu_lock.h"
#include "kvm_mm.h"

int __weak kvm_cpu_dirty_log_size(void)
{
Expand Down
12 changes: 9 additions & 3 deletions virt/kvm/kvm_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@

#include "coalesced_mmio.h"
#include "async_pf.h"
#include "mmu_lock.h"
#include "kvm_mm.h"
#include "vfio.h"

#define CREATE_TRACE_POINTS
Expand Down Expand Up @@ -711,6 +711,9 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
kvm->mn_active_invalidate_count++;
spin_unlock(&kvm->mn_invalidate_lock);

gfn_to_pfn_cache_invalidate_start(kvm, range->start, range->end,
hva_range.may_block);

__kvm_handle_hva_range(kvm, &hva_range);

return 0;
Expand Down Expand Up @@ -1071,6 +1074,9 @@ static struct kvm *kvm_create_vm(unsigned long type)
rcuwait_init(&kvm->mn_memslots_update_rcuwait);
xa_init(&kvm->vcpu_array);

INIT_LIST_HEAD(&kvm->gpc_list);
spin_lock_init(&kvm->gpc_lock);

INIT_LIST_HEAD(&kvm->devices);

BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX);
Expand Down Expand Up @@ -2539,8 +2545,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
* 2): @write_fault = false && @writable, @writable will tell the caller
* whether the mapping is writable.
*/
static kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
bool write_fault, bool *writable)
kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
bool write_fault, bool *writable)
{
struct vm_area_struct *vma;
kvm_pfn_t pfn = 0;
Expand Down
44 changes: 44 additions & 0 deletions virt/kvm/kvm_mm.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
// SPDX-License-Identifier: GPL-2.0-only

#ifndef __KVM_MM_H__
#define __KVM_MM_H__ 1

/*
* Architectures can choose whether to use an rwlock or spinlock
* for the mmu_lock. These macros, for use in common code
* only, avoids using #ifdefs in places that must deal with
* multiple architectures.
*/

#ifdef KVM_HAVE_MMU_RWLOCK
#define KVM_MMU_LOCK_INIT(kvm) rwlock_init(&(kvm)->mmu_lock)
#define KVM_MMU_LOCK(kvm) write_lock(&(kvm)->mmu_lock)
#define KVM_MMU_UNLOCK(kvm) write_unlock(&(kvm)->mmu_lock)
#define KVM_MMU_READ_LOCK(kvm) read_lock(&(kvm)->mmu_lock)
#define KVM_MMU_READ_UNLOCK(kvm) read_unlock(&(kvm)->mmu_lock)
#else
#define KVM_MMU_LOCK_INIT(kvm) spin_lock_init(&(kvm)->mmu_lock)
#define KVM_MMU_LOCK(kvm) spin_lock(&(kvm)->mmu_lock)
#define KVM_MMU_UNLOCK(kvm) spin_unlock(&(kvm)->mmu_lock)
#define KVM_MMU_READ_LOCK(kvm) spin_lock(&(kvm)->mmu_lock)
#define KVM_MMU_READ_UNLOCK(kvm) spin_unlock(&(kvm)->mmu_lock)
#endif /* KVM_HAVE_MMU_RWLOCK */

kvm_pfn_t hva_to_pfn(unsigned long addr, bool atomic, bool *async,
bool write_fault, bool *writable);

#ifdef CONFIG_HAVE_KVM_PFNCACHE
void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
unsigned long start,
unsigned long end,
bool may_block);
#else
static inline void gfn_to_pfn_cache_invalidate_start(struct kvm *kvm,
unsigned long start,
unsigned long end,
bool may_block)
{
}
#endif /* HAVE_KVM_PFNCACHE */

#endif /* __KVM_MM_H__ */
23 changes: 0 additions & 23 deletions virt/kvm/mmu_lock.h

This file was deleted.

Loading

0 comments on commit 982ed0d

Please sign in to comment.