Re: [PATCH 20/24] KVM: x86/mmu: Use a dedicated bit to track shadow/MMU-present SPTEs

From: Sean Christopherson
Date: Mon Mar 08 2021 - 16:50:24 EST


On Mon, Mar 08, 2021, Sean Christopherson wrote:
> On Mon, Mar 08, 2021, Tom Lendacky wrote:
> > On the hypervisor, I see the following:
> >
> > [ 55.886136] get_mmio_spte: detect reserved bits on spte, addr 0xffc12792, dump hierarchy:
> > [ 55.895284] ------ spte 0x1344a0827 level 4.
> > [ 55.900059] ------ spte 0x134499827 level 3.
> > [ 55.904877] ------ spte 0x165bf0827 level 2.
> > [ 55.909651] ------ spte 0xff800ffc12817 level 1.
>
> Ah fudge. I know what's wrong. The MMIO generation uses bit 11, which means
> is_shadow_present_pte() can get a false positive on high MMIO generations. This
> particular error can be squashed by explicitly checking for MMIO sptes in
> get_mmio_spte(), but I'm guessing there are other flows where a false positive
> is fatal (probably the __pte_list_remove bug below...). The safe thing to do is
> to steal bit 11 from MMIO SPTEs so that shadow present PTEs are the only thing
> that sets that bit.
>
> I'll reproduce by stuffing the MMIO generation and get a patch posted. Sorry :-/
>
> > When I kill the guest, I get a kernel panic:
> >
> > [ 95.539683] __pte_list_remove: 0000000040567a6a 0->BUG
> > [ 95.545481] kernel BUG at arch/x86/kvm/mmu/mmu.c:896!

Fudging get_mmio_spte() did allow the guest to boot, but as expected KVM panicked
during guest shutdown. Initial testing on the below changes look good, I'll
test more thoroughly and (hopefully) post later today.

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index b53036d9ddf3..e6e683e0fdcd 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -101,11 +101,11 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
#undef SHADOW_ACC_TRACK_SAVED_MASK

/*
- * Due to limited space in PTEs, the MMIO generation is a 20 bit subset of
+ * Due to limited space in PTEs, the MMIO generation is a 19 bit subset of
* the memslots generation and is derived as follows:
*
- * Bits 0-8 of the MMIO generation are propagated to spte bits 3-11
- * Bits 9-19 of the MMIO generation are propagated to spte bits 52-62
+ * Bits 0-7 of the MMIO generation are propagated to spte bits 3-10
+ * Bits 8-18 of the MMIO generation are propagated to spte bits 52-62
*
* The KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS flag is intentionally not included in
* the MMIO generation number, as doing so would require stealing a bit from
@@ -116,7 +116,7 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
*/

#define MMIO_SPTE_GEN_LOW_START 3
-#define MMIO_SPTE_GEN_LOW_END 11
+#define MMIO_SPTE_GEN_LOW_END 10

#define MMIO_SPTE_GEN_HIGH_START 52
#define MMIO_SPTE_GEN_HIGH_END 62
@@ -125,12 +125,14 @@ static_assert(!(EPT_SPTE_MMU_WRITABLE & SHADOW_ACC_TRACK_SAVED_MASK));
MMIO_SPTE_GEN_LOW_START)
#define MMIO_SPTE_GEN_HIGH_MASK GENMASK_ULL(MMIO_SPTE_GEN_HIGH_END, \
MMIO_SPTE_GEN_HIGH_START)
+static_assert(!(SPTE_MMU_PRESENT_MASK &
+ (MMIO_SPTE_GEN_LOW_MASK | MMIO_SPTE_GEN_HIGH_MASK)));

#define MMIO_SPTE_GEN_LOW_BITS (MMIO_SPTE_GEN_LOW_END - MMIO_SPTE_GEN_LOW_START + 1)
#define MMIO_SPTE_GEN_HIGH_BITS (MMIO_SPTE_GEN_HIGH_END - MMIO_SPTE_GEN_HIGH_START + 1)

/* remember to adjust the comment above as well if you change these */
-static_assert(MMIO_SPTE_GEN_LOW_BITS == 9 && MMIO_SPTE_GEN_HIGH_BITS == 11);
+static_assert(MMIO_SPTE_GEN_LOW_BITS == 8 && MMIO_SPTE_GEN_HIGH_BITS == 11);

#define MMIO_SPTE_GEN_LOW_SHIFT (MMIO_SPTE_GEN_LOW_START - 0)
#define MMIO_SPTE_GEN_HIGH_SHIFT (MMIO_SPTE_GEN_HIGH_START - MMIO_SPTE_GEN_LOW_BITS)