Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages
From: Vishal Annapurve
Date: Tue Jul 01 2025 - 03:13:56 EST
On Mon, Jun 30, 2025 at 11:06 PM Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote:
>
> On Mon, Jun 30, 2025 at 10:22:26PM -0700, Vishal Annapurve wrote:
> > On Mon, Jun 30, 2025 at 10:04 PM Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote:
> > >
> > > On Tue, Jul 01, 2025 at 05:45:54AM +0800, Edgecombe, Rick P wrote:
> > > > On Mon, 2025-06-30 at 12:25 -0700, Ackerley Tng wrote:
> > > > > > So for this we can do something similar. Have the arch/x86 side of TDX grow
> > > > > > a
> > > > > > new tdx_buggy_shutdown(). Have it do an all-cpu IPI to kick CPUs out of
> > > > > > SEAMMODE, wbivnd, and set a "no more seamcalls" bool. Then any SEAMCALLs
> > > > > > after
> > > > > > that will return a TDX_BUGGY_SHUTDOWN error, or similar. All TDs in the
> > > > > > system
> > > > > > die. Zap/cleanup paths return success in the buggy shutdown case.
> > > > > >
> > > > >
> > > > > Do you mean that on unmap/split failure:
> > > >
> > > > Maybe Yan can clarify here. I thought the HWpoison scenario was about TDX module
> > > My thinking is to set HWPoison to private pages whenever KVM_BUG_ON() was hit in
> > > TDX. i.e., when the page is still mapped in S-EPT but the TD is bugged on and
> > > about to tear down.
> > >
> > > So, it could be due to KVM or TDX module bugs, which retries can't help.
> > >
> > > > bugs. Not TDX busy errors, demote failures, etc. If there are "normal" failures,
> > > > like the ones that can be fixed with retries, then I think HWPoison is not a
> > > > good option though.
> > > >
> > > > > there is a way to make 100%
> > > > > sure all memory becomes re-usable by the rest of the host, using
> > > > > tdx_buggy_shutdown(), wbinvd, etc?
> > >
> > > Not sure about this approach. When TDX module is buggy and the page is still
> > > accessible to guest as private pages, even with no-more SEAMCALLs flag, is it
> > > safe enough for guest_memfd/hugetlb to re-assign the page to allow simultaneous
> > > access in shared memory with potential private access from TD or TDX module?
> >
> > If no more seamcalls are allowed and all cpus are made to exit SEAM
> > mode then how can there be potential private access from TD or TDX
> > module?
> Not sure. As Kirill said "TDX module has creative ways to corrupt it"
> https://lore.kernel.org/all/zlxgzuoqwrbuf54wfqycnuxzxz2yduqtsjinr5uq4ss7iuk2rt@qaaolzwsy6ki/.
I would assume that would be true only if TDX module logic is allowed
to execute. Otherwise it would be useful to understand these
"creative" ways better.