Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages

From: Yan Zhao
Date: Tue Jun 17 2025 - 20:22:48 EST


On Tue, Jun 17, 2025 at 11:52:48PM +0800, Edgecombe, Rick P wrote:
> On Tue, 2025-06-17 at 09:38 +0800, Yan Zhao wrote:
> > > We talked about doing something like having tdx_hold_page_on_error() in
> > > guestmemfd with a proper name. The separation of concerns will be better if
> > > we
> > > can just tell guestmemfd, the page has an issue. Then guestmemfd can decide
> > > how
> > > to handle it (refcount or whatever).
> > Instead of using tdx_hold_page_on_error(), the advantage of informing
> > guest_memfd that TDX is holding a page at 4KB granularity is that, even if
> > there
> > is a bug in KVM (such as forgetting to notify TDX to remove a mapping in
> > handle_removed_pt()), guest_memfd would be aware that the page remains mapped
> > in
> > the TDX module. This allows guest_memfd to determine how to handle the
> > problematic page (whether through refcount adjustments or other methods)
> > before
> > truncating it.
>
> I don't think a potential bug in KVM is a good enough reason. If we are
> concerned can we think about a warning instead?
>
> We had talked enhancing kasan to know when a page is mapped into S-EPT in the
> past. So rather than design around potential bugs we could focus on having a
> simpler implementation with the infrastructure to catch and fix the bugs.
However, if failing to remove a guest private page would only cause memory leak,
it's fine.
If TDX does not hold any refcount, guest_memfd has to know that which private
page is still mapped. Otherwise, the page may be re-assigned to other kernel
components while it may still be mapped in the S-EPT.


> >
> > > >
> > > > This would allow guest_memfd to maintain an internal reference count for
> > > > each
> > > > private GFN. TDX would call guest_memfd_add_page_ref_count() for mapping
> > > > and
> > > > guest_memfd_dec_page_ref_count() after a successful unmapping. Before
> > > > truncating
> > > > a private page from the filemap, guest_memfd could increase the real folio
> > > > reference count based on its internal reference count for the private GFN.
> > >
> > > What does this get us exactly? This is the argument to have less error prone
> > > code that can survive forgetting to refcount on error? I don't see that it
> > > is an
> > > especially special case.
> > Yes, for a less error prone code.
> >
> > If this approach is considered too complex for an initial implementation,
> > using
> > tdx_hold_page_on_error() is also a viable option.
>
> I'm saying I don't think it's not a good enough reason. Why is it different then
> other use-after free bugs? I feel like I'm missing something.
By tdx_hold_page_on_error(), it could be implememented as on removal failure,
invoke a guest_memfd interface to let guest_memfd know exact ranges still being
under use by the TDX module due to unmapping failures.
Do you think it's ok?