Re: [PATCH v5 08/13] KVM: Use memfile_pfn_ops to obtain pfn for private pages

From: Chao Peng
Date: Fri Apr 08 2022 - 10:07:45 EST


On Mon, Mar 28, 2022 at 11:56:06PM +0000, Sean Christopherson wrote:
> On Thu, Mar 10, 2022, Chao Peng wrote:
> > @@ -2217,4 +2220,34 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu)
> > /* Max number of entries allowed for each kvm dirty ring */
> > #define KVM_DIRTY_RING_MAX_ENTRIES 65536
> >
> > +#ifdef CONFIG_MEMFILE_NOTIFIER
> > +static inline long kvm_memfile_get_pfn(struct kvm_memory_slot *slot, gfn_t gfn,
> > + int *order)
> > +{
> > + pgoff_t index = gfn - slot->base_gfn +
> > + (slot->private_offset >> PAGE_SHIFT);
>
> This is broken for 32-bit kernels, where gfn_t is a 64-bit value but pgoff_t is a
> 32-bit value. There's no reason to support this for 32-bit kernels, so...
>
> The easiest fix, and likely most maintainable for other code too, would be to
> add a dedicated CONFIG for private memory, and then have KVM check that for all
> the memfile stuff. x86 can then select it only for 64-bit kernels, and in turn
> select MEMFILE_NOTIFIER iff private memory is supported.

Looks good.

>
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index ca7b2a6a452a..ee9c8c155300 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -48,7 +48,9 @@ config KVM
> select SRCU
> select INTERVAL_TREE
> select HAVE_KVM_PM_NOTIFIER if PM
> - select MEMFILE_NOTIFIER
> + select HAVE_KVM_PRIVATE_MEM if X86_64
> + select MEMFILE_NOTIFIER if HAVE_KVM_PRIVATE_MEM
> +
> help
> Support hosting fully virtualized guest machines using hardware
> virtualization extensions. You will need a fairly recent
>
> And in addition to replacing checks on CONFIG_MEMFILE_NOTIFIER, the probing of
> whether or not KVM_MEM_PRIVATE is allowed can be:
>
> @@ -1499,23 +1499,19 @@ static void kvm_replace_memslot(struct kvm *kvm,
> }
> }
>
> -bool __weak kvm_arch_private_memory_supported(struct kvm *kvm)
> -{
> - return false;
> -}
> -
> static int check_memory_region_flags(struct kvm *kvm,
> const struct kvm_userspace_memory_region *mem)
> {
> u32 valid_flags = KVM_MEM_LOG_DIRTY_PAGES;
>
> - if (kvm_arch_private_memory_supported(kvm))
> - valid_flags |= KVM_MEM_PRIVATE;
> -
> #ifdef __KVM_HAVE_READONLY_MEM
> valid_flags |= KVM_MEM_READONLY;
> #endif
>
> +#ifdef CONFIG_KVM_HAVE_PRIVATE_MEM
> + valid_flags |= KVM_MEM_PRIVATE;
> +#endif
> +
> if (mem->flags & ~valid_flags)
> return -EINVAL;
>
> > +
> > + return slot->pfn_ops->get_lock_pfn(file_inode(slot->private_file),
> > + index, order);
>
> In a similar vein, get_locK_pfn() shouldn't return a "long". KVM likely won't use
> these APIs on 32-bit kernels, but that may not hold true for other subsystems, and
> this code is confusing and technically wrong. The pfns for struct page squeeze
> into an unsigned long because PAE support is capped at 64gb, but casting to a
> signed long could result in a pfn with bit 31 set being misinterpreted as an error.
>
> Even returning an "unsigned long" for the pfn is wrong. It "works" for the shmem
> code because shmem deals only with struct page, but it's technically wrong, especially
> since one of the selling points of this approach is that it can work without struct
> page.

Hmmm, that's correct.

>
> OUT params suck, but I don't see a better option than having the return value be
> 0/-errno, with "pfn_t *pfn" for the resolved pfn.
>
> > +}
> > +
> > +static inline void kvm_memfile_put_pfn(struct kvm_memory_slot *slot,
> > + kvm_pfn_t pfn)
> > +{
> > + slot->pfn_ops->put_unlock_pfn(pfn);
> > +}
> > +
> > +#else
> > +static inline long kvm_memfile_get_pfn(struct kvm_memory_slot *slot, gfn_t gfn,
> > + int *order)
> > +{
>
> This should be a WARN_ON() as its usage should be guarded by a KVM_PRIVATE_MEM
> check, and private memslots should be disallowed in this case.
>
> Alternatively, it might be a good idea to #ifdef these out entirely and not provide
> stubs. That'd likely require a stub or two in arch code, but overall it might be
> less painful in the long run, e.g. would force us to more carefully consider the
> touch points for private memory. Definitely not a requirement, just an idea.

Make sense, let me try.

Thanks,
Chao