Re: [PATCH v11 0/8] KVM: allow mapping non-refcounted pages

From: Paolo Bonzini
Date: Mon Mar 18 2024 - 09:11:22 EST


On Mon, Mar 18, 2024 at 3:07 AM Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote:
> > > Fundamentally, what this series is doing is
> > > allowing pfns returned by follow_pte to be mapped into KVM's shadow
> > > MMU without inadvertently translating them into struct pages.
> >
> > As far as I can tell that is really the right thing to do. Yes.
>
> IFF your callers don't need pages and you just want to track the
> mapping in the shadow mmu and never take a refcount that is a good
> thing.

Yes, that's the case and for everything else we can use a function
that goes from guest physical address to struct page with a reference
taken, similar to the current gfn_to_page family of functions.

> But unless I completely misunderstood the series that doesn't seem
> to be the case - it builds a new kvm_follow_pfn API which is another
> of these weird multiplexers like get_user_pages that can to tons of
> different things depending on the flags. And some of that still
> grabs the refcount, right?

Yes, for a couple reasons. First, a lot of the lookup logic is shared
by the two cases; second, it's easier for both developers and
reviewers if you first convert to the new API, and remove the refcount
in a separate commit. Also, you have to do this for every architecture
since we agree that this is the direction that all of them should move
to.

So what we could do, would be to start with something like David's
patches, and move towards forbidding the FOLL_GET flag (the case that
returns the page with elevated refcount) in the new kvm_follow_pfn()
API.

Another possibility is to have a double-underscore version that allows
FOLL_GET, and have the "clean" kvm_follow_pfn() forbid it. So you
would still have the possibility to convert to __kvm_follow_pfn() with
FOLL_GET first, and then when you remove the refcount you switch to
kvm_follow_pfn().


Paolo

> > Completely agree. In my thinking when you go a step further and offload
> > grabbing the page reference to get_user_pages() then you are always on the
> > save side.
>
> Agreed.
>
> > Because then it is no longer the responsibility of the KVM code to get all
> > the rules around that right, instead you are relying on a core functionality
> > which should (at least in theory) do the correct thing.
>
> Exactly.
>