Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd

From: Xiaoyao Li
Date: Thu Jun 19 2025 - 05:46:29 EST


On 6/19/2025 5:28 PM, Yan Zhao wrote:
On Thu, Jun 19, 2025 at 05:18:44PM +0800, Xiaoyao Li wrote:
On 6/19/2025 4:59 PM, Xiaoyao Li wrote:
On 6/19/2025 4:13 PM, Yan Zhao wrote:
On Wed, May 14, 2025 at 04:41:39PM -0700, Ackerley Tng wrote:
Hello,

This patchset builds upon discussion at LPC 2024 and many guest_memfd
upstream calls to provide 1G page support for guest_memfd by taking
pages from HugeTLB.

This patchset is based on Linux v6.15-rc6, and requires the mmap support
for guest_memfd patchset (Thanks Fuad!) [1].

For ease of testing, this series is also available, stitched together,
at
https://github.com/googleprodkernel/linux-cc/tree/gmem-1g-page-
support-rfc-v2
Just to record a found issue -- not one that must be fixed.

In TDX, the initial memory region is added as private memory during
TD's build
time, with its initial content copied from source pages in shared memory.
The copy operation requires simultaneous access to both shared
source memory
and private target memory.

Therefore, userspace cannot store the initial content in shared
memory at the
mmap-ed VA of a guest_memfd that performs in-place conversion
between shared and
private memory. This is because the guest_memfd will first unmap a
PFN in shared
page tables and then check for any extra refcount held for the
shared PFN before
converting it to private.

I have an idea.

If I understand correctly, the KVM_GMEM_CONVERT_PRIVATE of in-place
conversion unmap the PFN in shared page tables while keeping the content
of the page unchanged, right?
However, whenever there's a GUP in TDX to get the source page, there will be an
extra page refcount.

The GUP in TDX happens after the gmem converts the page to private.

In the view of TDX, the physical page is converted to private already and it contains the initial content. But the content is not usable for TDX until TDX calls in-place PAGE.ADD

So KVM_GMEM_CONVERT_PRIVATE can be used to initialize the private memory
actually for non-CoCo case actually, that userspace first mmap() it and
ensure it's shared and writes the initial content to it, after it
userspace convert it to private with KVM_GMEM_CONVERT_PRIVATE.
The conversion request here will be declined therefore.


For CoCo case, like TDX, it can hook to KVM_GMEM_CONVERT_PRIVATE if it
wants the private memory to be initialized with initial content, and
just do in-place TDH.PAGE.ADD in the hook.

And maybe a new flag for KVM_GMEM_CONVERT_PRIVATE for user space to
explicitly request that the page range is converted to private and the
content needs to be retained. So that TDX can identify which case needs to
call in-place TDH.PAGE.ADD.