Re: [PATCH v3 3/6] KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust()

From: Sean Christopherson
Date: Fri Apr 01 2022 - 10:55:55 EST


On Fri, Apr 01, 2022, Mingwei Zhang wrote:
> Explicitly check if a NX huge page is disallowed when determining if a page
> fault needs to be forced to use a smaller sized page. KVM incorrectly
> assumes that the NX huge page mitigation is the only scenario where KVM
> will create a shadow page instead of a huge page. Any scenario that causes
> KVM to zap leaf SPTEs may result in having a SP that can be made huge
> without violating the NX huge page mitigation. E.g. disabling of dirty
> logging, zapping from mmu_notifier due to page migration, guest MTRR
> changes that affect the viability of a huge page, etc...
>
> Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
>
> Reviewed-by: Ben Gardon <bgardon@xxxxxxxxxx>

Reviewed-by (and Acked-by and Tested-by) tags should be dropped when there is a
non-trivial, functional, and/or semantic change between versions. It's often a
judgment call, but in this case I would definitely drop Ben's review. It's usually
better to err on the side of caution for this rule/guideline.

> Signed-off-by: Mingwei Zhang <mizhang@xxxxxxxxxx>
> ---
> arch/x86/kvm/mmu/mmu.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 5cb845fae56e..033609e8b332 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -2896,6 +2896,16 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_
> cur_level == fault->goal_level &&
> is_shadow_present_pte(spte) &&
> !is_large_pte(spte)) {
> + struct kvm_mmu_page *sp;
> + u64 page_mask;
> +
> + sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);
> +
> + /* Prevent lpage_disallowed read from moving ahead. */
> + smp_rmb();

smp_rmb() needs to be paired with e.g. smp_wmb(), and the comment is supposed to
tie the two together (the reason checkpatch warns about comments is to try and
prevent sprinking bogus barriers everywhere). These types of barriers always need
to be paired with something, as both the reader(s) and the writer need to have
specific, coordinated ordering. E.g. reading lpage_disallowed after checking for
a present SPTE is meaningless if the writer doesn't set lpage_disallowed before
setting the SPTE present (which is the purpose of patch 1). I didn't add the
smp_wmb() in patch 1 because it's not necessary as of that patch, it's only when
the concurrent reader comes along in this patch that the barrier becomes necessary.

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9f73c175845e..041b1ab838f0 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2901,7 +2901,12 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_

sp = to_shadow_page(spte & PT64_BASE_ADDR_MASK);

- /* Prevent lpage_disallowed read from moving ahead. */
+ /*
+ * Ensure lpage_disallowed is read after checking for a present
+ * SPTE. A different vCPU may be concurrently installing the
+ * shadow page if mmu_lock is held for read. Pairs with the
+ * smp_wmb() in kvm_tdp_mmu_map().
+ */
smp_rmb();

if (!sp->lpage_disallowed)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 1f44826cf794..2422684f1dc3 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1186,6 +1186,13 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)

sp->lpage_disallowed = account_nx;

+ /*
+ * Ensure lpage_disallowed is visible before the SP is
+ * marked present, as mmu_lock is held for read. Pairs
+ * with the smp_rmb() in disallowed_hugepage_adjust().
+ */
+ smp_wmb();
+
if (tdp_mmu_link_sp(kvm, &iter, sp, true)) {
tdp_mmu_free_sp(sp);
break;