Re: [PATCH RFC v8 01/56] KVM: x86: Add 'fault_is_private' x86 op

From: Michael Roth
Date: Mon Mar 20 2023 - 13:52:53 EST


On Fri, Mar 17, 2023 at 09:51:37PM -0700, Isaku Yamahata wrote:
> On Mon, Feb 20, 2023 at 12:37:52PM -0600,
> Michael Roth <michael.roth@xxxxxxx> wrote:
>
> > This callback is used by the KVM MMU to check whether a #NPF was for a
> > private GPA or not.
> >
> > In some cases the full 64-bit error code for the #NPF will be needed to
> > make this determination, so also update kvm_mmu_do_page_fault() to
> > accept the full 64-bit value so it can be plumbed through to the
> > callback.

Hi Isaku, Zhi,

Thanks for your efforts trying to get us in sync on these shared
interfaces. Would be great to have a common base we can build on for the
SNP/TDX series. You mentioned a couple patches here that I couldn't find
on the list, are you planning to submit these as a separate series?

>
> We can split 64-bit part into the independent patch.

Agreed that makes sense.

>
> > Signed-off-by: Michael Roth <michael.roth@xxxxxxx>
> > ---

<snip>

> > +static bool kvm_mmu_fault_is_private(struct kvm *kvm, gpa_t gpa, u64 err)
> > +{
> > + struct kvm_memory_slot *slot;
> > + bool private_fault = false;
> > + gfn_t gfn = gpa_to_gfn(gpa);
> > +
> > + slot = gfn_to_memslot(kvm, gfn);
> > + if (!slot) {
> > + pr_debug("%s: no slot, GFN: 0x%llx\n", __func__, gfn);
> > + goto out;
> > + }
> > +
> > + if (!kvm_slot_can_be_private(slot)) {
> > + pr_debug("%s: slot is not private, GFN: 0x%llx\n", __func__, gfn);
> > + goto out;
> > + }
> > +
> > + if (static_call(kvm_x86_fault_is_private)(kvm, gpa, err, &private_fault))
> > + goto out;
> > +
> > + /*
> > + * Handling below is for UPM self-tests and guests that treat userspace
> > + * as the authority on whether a fault should be private or not.
> > + */
> > + private_fault = kvm_mem_is_private(kvm, gpa >> PAGE_SHIFT);
> > +
> > +out:
> > + pr_debug("%s: GFN: 0x%llx, private: %d\n", __func__, gfn, private_fault);
> > + return private_fault;
> > +}
> > +
> > /*
> > * Return values of handle_mmio_page_fault(), mmu.page_fault(), fast_page_fault(),
> > * and of course kvm_mmu_do_page_fault().
> > @@ -262,11 +293,11 @@ enum {
> > };
> >
> > static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> > - u32 err, bool prefetch)
> > + u64 err, bool prefetch)
> > {
> > struct kvm_page_fault fault = {
> > .addr = cr2_or_gpa,
> > - .error_code = err,
> > + .error_code = lower_32_bits(err),
> > .exec = err & PFERR_FETCH_MASK,
> > .write = err & PFERR_WRITE_MASK,
> > .present = err & PFERR_PRESENT_MASK,
> > @@ -280,7 +311,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> > .max_level = KVM_MAX_HUGEPAGE_LEVEL,
> > .req_level = PG_LEVEL_4K,
> > .goal_level = PG_LEVEL_4K,
> > - .is_private = kvm_mem_is_private(vcpu->kvm, cr2_or_gpa >> PAGE_SHIFT),
> > + .is_private = kvm_mmu_fault_is_private(vcpu->kvm, cr2_or_gpa, err),
>
> I don't think kvm_mmu_fault_is_private(). It's too heavy. We can make it
> it's own. I.e. the following.

Is it causing performance issues? If most of that is mainly due to
gfn_to_memslot()/kvm_slot_can_be_private() check, then maybe that part
can be dropped. In the past Sean has mentioned that we shouldn't have to
do kvm_slot_can_be_private() checks prior to kvm_mem_is_private(), but I
haven't tried removing those yet to see if things still work as expected.

>
> From b0f914a1a4d154f076c0294831ce9ef0df7eb3d3 Mon Sep 17 00:00:00 2001
> Message-Id: <b0f914a1a4d154f076c0294831ce9ef0df7eb3d3.1679114841.git.isaku.yamahata@xxxxxxxxx>
> In-Reply-To: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@xxxxxxxxx>
> References: <428a676face7a06a90e59dca1c32941c9b6ee001.1679114841.git.isaku.yamahata@xxxxxxxxx>
> From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> Date: Fri, 17 Mar 2023 11:18:13 -0700
> Subject: [PATCH 2/4] KVM: x86: Add 'fault_is_private' x86 op
>
> This callback is used by the KVM MMU to check whether a KVM page fault was
> for a private GPA or not.
>
> Originally-by: Michael Roth <michael.roth@xxxxxxx>
> Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx>
> ---
> arch/x86/include/asm/kvm-x86-ops.h | 1 +
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/mmu.h | 19 +++++++++++++++++++
> arch/x86/kvm/mmu/mmu_internal.h | 2 +-
> arch/x86/kvm/x86.c | 8 ++++++++
> 5 files changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index e1f57905c8fe..dc5f18ac0bd5 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -99,6 +99,7 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr)
> KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr)
> KVM_X86_OP_OPTIONAL_RET0(get_mt_mask)
> KVM_X86_OP(load_mmu_pgd)
> +KVM_X86_OP(fault_is_private)
> KVM_X86_OP_OPTIONAL(link_private_spt)
> KVM_X86_OP_OPTIONAL(free_private_spt)
> KVM_X86_OP_OPTIONAL(split_private_spt)
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 59196a80c3c8..0382d236fbf4 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1730,6 +1730,7 @@ struct kvm_x86_ops {
>
> void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa,
> int root_level);
> + bool (*fault_is_private)(struct kvm *kvm, gpa_t gpa, u64 error_code);
>
> int (*link_private_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level,
> void *private_spt);
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 4aaef2132b97..1f21680b9b97 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -289,6 +289,25 @@ static inline gpa_t kvm_translate_gpa(struct kvm_vcpu *vcpu,
> return translate_nested_gpa(vcpu, gpa, access, exception);
> }
>
> +static inline bool kvm_mmu_fault_is_private_default(struct kvm *kvm, gpa_t gpa, u64 err)
> +{
> + struct kvm_memory_slot *slot;
> + gfn_t gfn = gpa_to_gfn(gpa);
> +
> + slot = gfn_to_memslot(kvm, gfn);
> + if (!slot)
> + return false;
> +
> + if (!kvm_slot_can_be_private(slot))
> + return false;
> +
> + /*
> + * Handling below is for UPM self-tests and guests that treat userspace
> + * as the authority on whether a fault should be private or not.
> + */
> + return kvm_mem_is_private(kvm, gfn);
> +}
> +
> static inline gfn_t kvm_gfn_shared_mask(const struct kvm *kvm)
> {
> #ifdef CONFIG_KVM_MMU_PRIVATE
> diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
> index bb5709f1cb57..6b54b069d1ed 100644
> --- a/arch/x86/kvm/mmu/mmu_internal.h
> +++ b/arch/x86/kvm/mmu/mmu_internal.h
> @@ -445,7 +445,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> .max_level = vcpu->kvm->arch.tdp_max_page_level,
> .req_level = PG_LEVEL_4K,
> .goal_level = PG_LEVEL_4K,
> - .is_private = kvm_is_private_gpa(vcpu->kvm, cr2_or_gpa),
> + .is_private = static_call(kvm_x86_fault_is_private)(vcpu->kvm, cr2_or_gpa, err),
> };
> int r;
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fd14368c6bc8..0311ab450330 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -9419,6 +9419,14 @@ static inline void kvm_ops_update(struct kvm_x86_init_ops *ops)
> #undef __KVM_X86_OP
>
> kvm_pmu_ops_update(ops->pmu_ops);
> +
> + /*
> + * TODO: Once all backend fills this option, remove this and the default
> + * function.
> + */
> + if (!ops->runtime_ops->fault_is_private)
> + static_call_update(kvm_x86_fault_is_private,
> + kvm_mmu_fault_is_private_default);

I'm not sure about this approach, since the self-tests (and possibly SEV
(which doesn't use a separate #NPF error bit like SNP/TDX)) currently
rely on that kvm_mem_is_private() call to determine whether to handle as
a private fault or not. But to run either of those, we would need to
load the kvm_amd module, which will have already introduced it's own
kvm_x86_fault_is_private implementation via svm_init(), so the handling
provided by kvm_mmu_fault_is_private_default would never be available and
so we wouldn't be able to run the UPM self-tests.

To me it seems like that handling always needs to be in place as a
fallback when not running SNP/TDX. It doesn't necessarily need to be in the
kvm_x86_fault_is_private handler though, maybe some generic handling for
UPM selftests can be pushed down into KVM MMU. Doing so could also
address a race that Sean mentioned between the time kvm_mem_is_private()
is called here (which happens before mmu_invalidate_seq is recorded for
the #NPF) vs. when it actually gets used in __kvm_faultin_pfn().

If we take that approach, then the requirements for specific TDX/SNP
handling are reduced as well, since we only need to check the
encryption/shared bit, and that could maybe be done as a simple setting
that where you tell KVM MMU the position of the bit, whether it
indicates shared vs. private, then both TDX/SNP could re-use a simple
helper to check the #NPF error code and set .is_private based on that.

Then KVM MMU could, if no bit is indicated, just fall back to using the
value of kvm_mem_is_private() somewhere in __kvm_fault_pfn() or
something.

I mentioned this to Sean a while back, which I think is compatible with
what he was looking for:

https://lore.kernel.org/lkml/20230220162210.42rjdgbdwbjiextz@xxxxxxx/

Would be good to get his input before spending too much time adding new
state/configuration stuff in KVM MMU though.

As an interim solution, would my original patch work if we could
confirm that the gfn_to_memslot()/kvm_slot_can_be_private() sequence is
no longer needed?

Thanks!

-Mike

> }
>
> static int kvm_x86_check_processor_compatibility(void)
> --
> 2.25.1
>
>
>
>
> --
> Isaku Yamahata <isaku.yamahata@xxxxxxxxx>