Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd
From: Yan Zhao
Date: Thu Jun 19 2025 - 05:31:32 EST
On Thu, Jun 19, 2025 at 05:18:44PM +0800, Xiaoyao Li wrote:
> On 6/19/2025 4:59 PM, Xiaoyao Li wrote:
> > On 6/19/2025 4:13 PM, Yan Zhao wrote:
> > > On Wed, May 14, 2025 at 04:41:39PM -0700, Ackerley Tng wrote:
> > > > Hello,
> > > >
> > > > This patchset builds upon discussion at LPC 2024 and many guest_memfd
> > > > upstream calls to provide 1G page support for guest_memfd by taking
> > > > pages from HugeTLB.
> > > >
> > > > This patchset is based on Linux v6.15-rc6, and requires the mmap support
> > > > for guest_memfd patchset (Thanks Fuad!) [1].
> > > >
> > > > For ease of testing, this series is also available, stitched together,
> > > > at
> > > > https://github.com/googleprodkernel/linux-cc/tree/gmem-1g-page-
> > > > support-rfc-v2
> > > Just to record a found issue -- not one that must be fixed.
> > >
> > > In TDX, the initial memory region is added as private memory during
> > > TD's build
> > > time, with its initial content copied from source pages in shared memory.
> > > The copy operation requires simultaneous access to both shared
> > > source memory
> > > and private target memory.
> > >
> > > Therefore, userspace cannot store the initial content in shared
> > > memory at the
> > > mmap-ed VA of a guest_memfd that performs in-place conversion
> > > between shared and
> > > private memory. This is because the guest_memfd will first unmap a
> > > PFN in shared
> > > page tables and then check for any extra refcount held for the
> > > shared PFN before
> > > converting it to private.
> >
> > I have an idea.
> >
> > If I understand correctly, the KVM_GMEM_CONVERT_PRIVATE of in-place
> > conversion unmap the PFN in shared page tables while keeping the content
> > of the page unchanged, right?
However, whenever there's a GUP in TDX to get the source page, there will be an
extra page refcount.
> > So KVM_GMEM_CONVERT_PRIVATE can be used to initialize the private memory
> > actually for non-CoCo case actually, that userspace first mmap() it and
> > ensure it's shared and writes the initial content to it, after it
> > userspace convert it to private with KVM_GMEM_CONVERT_PRIVATE.
The conversion request here will be declined therefore.
> > For CoCo case, like TDX, it can hook to KVM_GMEM_CONVERT_PRIVATE if it
> > wants the private memory to be initialized with initial content, and
> > just do in-place TDH.PAGE.ADD in the hook.
>
> And maybe a new flag for KVM_GMEM_CONVERT_PRIVATE for user space to
> explicitly request that the page range is converted to private and the
> content needs to be retained. So that TDX can identify which case needs to
> call in-place TDH.PAGE.ADD.
>