Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages
From: Yan Zhao
Date: Wed Jun 18 2025 - 03:00:31 EST
On Tue, Jun 17, 2025 at 11:44:34PM -0700, Vishal Annapurve wrote:
> On Tue, Jun 17, 2025 at 11:34 PM Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote:
> >
> > On Tue, Jun 17, 2025 at 11:21:41PM -0700, Vishal Annapurve wrote:
> > > On Tue, Jun 17, 2025 at 11:15 PM Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote:
> > > >
> > > > On Tue, Jun 17, 2025 at 09:33:02PM -0700, Vishal Annapurve wrote:
> > > > > On Tue, Jun 17, 2025 at 5:49 PM Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Wed, Jun 18, 2025 at 08:34:24AM +0800, Edgecombe, Rick P wrote:
> > > > > > > On Tue, 2025-06-17 at 01:09 -0700, Vishal Annapurve wrote:
> > > > > > > > Sorry I quoted Ackerley's response wrongly. Here is the correct reference [1].
> > > > > > >
> > > > > > > I'm confused...
> > > > > > >
> > > > > > > >
> > > > > > > > Speculative/transient refcounts came up a few times In the context of
> > > > > > > > guest_memfd discussions, some examples include: pagetable walkers,
> > > > > > > > page migration, speculative pagecache lookups, GUP-fast etc. David H
> > > > > > > > can provide more context here as needed.
> > > > > > > >
> > > > > > > > Effectively some core-mm features that are present today or might land
> > > > > > > > in the future can cause folio refcounts to be grabbed for short
> > > > > > > > durations without actual access to underlying physical memory. These
> > > > > > > > scenarios are unlikely to happen for private memory but can't be
> > > > > > > > discounted completely.
> > > > > > >
> > > > > > > This means the refcount could be increased for other reasons, and so guestmemfd
> > > > > > > shouldn't rely on refcounts for it's purposes? So, it is not a problem for other
> > > > > > > components handling the page elevate the refcount?
> > > > > > Besides that, in [3], when kvm_gmem_convert_should_proceed() determines whether
> > > > > > to convert to private, why is it allowed to just invoke
> > > > > > kvm_gmem_has_safe_refcount() without taking speculative/transient refcounts into
> > > > > > account? Isn't it more easier for shared pages to have speculative/transient
> > > > > > refcounts?
> > > > >
> > > > > These speculative refcounts are taken into account, in case of unsafe
> > > > > refcounts, conversion operation immediately exits to userspace with
> > > > > EAGAIN and userspace is supposed to retry conversion.
> > > > Hmm, so why can't private-to-shared conversion also exit to userspace with
> > > > EAGAIN?
> > >
> > > How would userspace/guest_memfd differentiate between
> > > speculative/transient refcounts and extra refcounts due to TDX unmap
> > > failures?
> > Hmm, it also can't differentiate between speculative/transient refcounts and
> > extra refcounts on shared folios due to other reasons.
> >
>
> In case of shared memory ranges, userspace is effectively responsible
> for extra refcounts and can act towards removing them if not done
> already. If "extra" refcounts are taken care of then the only
> remaining scenario is speculative/transient refcounts.
>
> But for private memory ranges, userspace is not responsible for any
> refcounts landing on them.
Ok. The similarities between the two are:
- userspace can't help on speculative/transient refcounts.
- userspace can't make conversion success with "extra" refcounts, whether held
by user or by TDX.
But I think I get your point that EAGAIN is not the right code in case of
"extra" refcounts held by TDX.