Skip to content

Commit

Permalink
KVM: x86/mmu: Mark page/folio accessed only when zapping leaf SPTEs
Browse files Browse the repository at this point in the history
Now that KVM doesn't clobber Accessed bits of shadow-present SPTEs,
e.g. when prefetching, mark folios as accessed only when zapping leaf
SPTEs, which is a rough heuristic for "only in response to an mmu_notifier
invalidation".  Page aging and LRUs are tolerant of false negatives, i.e.
KVM doesn't need to be precise for correctness, and re-marking folios as
accessed when zapping entire roots or when zapping collapsible SPTEs is
expensive and adds very little value.

E.g. when a VM is dying, all of its memory is being freed; marking folios
accessed at that time provides no known value.  Similarly, because KVM
marks folios as accessed when creating SPTEs, marking all folios as
accessed when userspace happens to delete a memslot doesn't add value.
The folio was marked access when the old SPTE was created, and will be
marked accessed yet again if a vCPU accesses the pfn again after reloading
a new root.  Zapping collapsible SPTEs is a similar story; marking folios
accessed just because userspace disable dirty logging is a side effect of
KVM behavior, not a deliberate goal.

As an intermediate step, a.k.a. bisection point, towards *never* marking
folios accessed when dropping SPTEs, mark folios accessed when the primary
MMU might be invalidating mappings, as such zappings are not KVM initiated,
i.e. might actually be related to page aging and LRU activity.

Note, x86 is the only KVM architecture that "double dips"; every other
arch marks pfns as accessed only when mapping into the guest, not when
mapping into the guest _and_ when removing from the guest.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20241010182427.1434605-10-seanjc@google.com>
  • Loading branch information
Sean Christopherson authored and Paolo Bonzini committed Oct 25, 2024
1 parent aa85986 commit 5f6a3ba
Show file tree
Hide file tree
Showing 3 changed files with 43 additions and 44 deletions.
76 changes: 39 additions & 37 deletions Documentation/virt/kvm/locking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -147,49 +147,51 @@ Then, we can ensure the dirty bitmaps is correctly set for a gfn.

2) Dirty bit tracking

In the origin code, the spte can be fast updated (non-atomically) if the
In the original code, the spte can be fast updated (non-atomically) if the
spte is read-only and the Accessed bit has already been set since the
Accessed bit and Dirty bit can not be lost.

But it is not true after fast page fault since the spte can be marked
writable between reading spte and updating spte. Like below case:

+------------------------------------------------------------------------+
| At the beginning:: |
| |
| spte.W = 0 |
| spte.Accessed = 1 |
+------------------------------------+-----------------------------------+
| CPU 0: | CPU 1: |
+------------------------------------+-----------------------------------+
| In mmu_spte_clear_track_bits():: | |
| | |
| old_spte = *spte; | |
| | |
| | |
| /* 'if' condition is satisfied. */| |
| if (old_spte.Accessed == 1 && | |
| old_spte.W == 0) | |
| spte = 0ull; | |
+------------------------------------+-----------------------------------+
| | on fast page fault path:: |
| | |
| | spte.W = 1 |
| | |
| | memory write on the spte:: |
| | |
| | spte.Dirty = 1 |
+------------------------------------+-----------------------------------+
| :: | |
| | |
| else | |
| old_spte = xchg(spte, 0ull) | |
| if (old_spte.Accessed == 1) | |
| kvm_set_pfn_accessed(spte.pfn);| |
| if (old_spte.Dirty == 1) | |
| kvm_set_pfn_dirty(spte.pfn); | |
| OOPS!!! | |
+------------------------------------+-----------------------------------+
+-------------------------------------------------------------------------+
| At the beginning:: |
| |
| spte.W = 0 |
| spte.Accessed = 1 |
+-------------------------------------+-----------------------------------+
| CPU 0: | CPU 1: |
+-------------------------------------+-----------------------------------+
| In mmu_spte_update():: | |
| | |
| old_spte = *spte; | |
| | |
| | |
| /* 'if' condition is satisfied. */ | |
| if (old_spte.Accessed == 1 && | |
| old_spte.W == 0) | |
| spte = new_spte; | |
+-------------------------------------+-----------------------------------+
| | on fast page fault path:: |
| | |
| | spte.W = 1 |
| | |
| | memory write on the spte:: |
| | |
| | spte.Dirty = 1 |
+-------------------------------------+-----------------------------------+
| :: | |
| | |
| else | |
| old_spte = xchg(spte, new_spte);| |
| if (old_spte.Accessed && | |
| !new_spte.Accessed) | |
| flush = true; | |
| if (old_spte.Dirty && | |
| !new_spte.Dirty) | |
| flush = true; | |
| OOPS!!! | |
+-------------------------------------+-----------------------------------+

The Dirty bit is lost in this case.

Expand Down
4 changes: 1 addition & 3 deletions arch/x86/kvm/mmu/mmu.c
Original file line number Diff line number Diff line change
Expand Up @@ -542,10 +542,8 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte)
* to guarantee consistency between TLB and page tables.
*/

if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte)) {
if (is_accessed_spte(old_spte) && !is_accessed_spte(new_spte))
flush = true;
kvm_set_pfn_accessed(spte_to_pfn(old_spte));
}

if (is_dirty_spte(old_spte) && !is_dirty_spte(new_spte))
flush = true;
Expand Down
7 changes: 3 additions & 4 deletions arch/x86/kvm/mmu/tdp_mmu.c
Original file line number Diff line number Diff line change
Expand Up @@ -520,10 +520,6 @@ static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn,
if (was_present && !was_leaf &&
(is_leaf || !is_present || WARN_ON_ONCE(pfn_changed)))
handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared);

if (was_leaf && is_accessed_spte(old_spte) &&
(!is_present || !is_accessed_spte(new_spte) || pfn_changed))
kvm_set_pfn_accessed(spte_to_pfn(old_spte));
}

static inline int __must_check __tdp_mmu_set_spte_atomic(struct tdp_iter *iter,
Expand Down Expand Up @@ -865,6 +861,9 @@ static bool tdp_mmu_zap_leafs(struct kvm *kvm, struct kvm_mmu_page *root,

tdp_mmu_iter_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);

if (is_accessed_spte(iter.old_spte))
kvm_set_pfn_accessed(spte_to_pfn(iter.old_spte));

/*
* Zappings SPTEs in invalid roots doesn't require a TLB flush,
* see kvm_tdp_mmu_zap_invalidated_roots() for details.
Expand Down

0 comments on commit 5f6a3ba

Please sign in to comment.