Re: [PATCH 6/8] KVM: gmem, x86: Add gmem hook for invalidating private memory

From: Xu Yilun
Date: Tue Oct 10 2023 - 05:19:04 EST


On 2023-08-15 at 10:18:53 -0700, isaku.yamahata@xxxxxxxxx wrote:
> From: Michael Roth <michael.roth@xxxxxxx>
>
> TODO: add a CONFIG option that can be to completely skip arch
> invalidation loop and avoid __weak references for arch/platforms that
> don't need an additional invalidation hook.
>
> In some cases, like with SEV-SNP, guest memory needs to be updated in a
> platform-specific manner before it can be safely freed back to the host.
> Add hooks to wire up handling of this sort when freeing memory in
> response to FALLOC_FL_PUNCH_HOLE operations.
>
> Also issue invalidations of all allocated pages when releasing the gmem
> file so that the pages are not left in an unusable state when they get
> freed back to the host.
>
> Signed-off-by: Michael Roth <michael.roth@xxxxxxx>
> Link: https://lore.kernel.org/r/20230612042559.375660-3-michael.roth@xxxxxxx
>

[...]

> +/* Handle arch-specific hooks needed before releasing guarded pages. */
> +static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct file *file,
> + pgoff_t start, pgoff_t end)
> +{
> + pgoff_t file_end = i_size_read(file_inode(file)) >> PAGE_SHIFT;
> + pgoff_t index = start;
> +
> + end = min(end, file_end);
> +
> + while (index < end) {
> + struct folio *folio;
> + unsigned int order;
> + struct page *page;
> + kvm_pfn_t pfn;
> +
> + folio = __filemap_get_folio(file->f_mapping, index,
> + FGP_LOCK, 0);
> + if (!folio) {
> + index++;
> + continue;
> + }
> +
> + page = folio_file_page(folio, index);
> + pfn = page_to_pfn(page);
> + order = folio_order(folio);
> +
> + kvm_arch_gmem_invalidate(kvm, pfn, pfn + min((1ul << order), end - index));

Observed an issue there.

The valid page may not point to the first page in the folio, then the
range [pfn, pfn + (1ul << order)) expands to the next folio. This makes
a part of the pages be invalidated again when loop to the next folio.

On TDX, it causes TDH_PHYMEM_PAGE_WBINVD failed.

> +
> + index = folio_next_index(folio);
> + folio_unlock(folio);
> + folio_put(folio);
> +
> + cond_resched();
> + }
> +}

My fix would be:

diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c
index e629782d73d5..3665003c3746 100644
--- a/virt/kvm/guest_mem.c
+++ b/virt/kvm/guest_mem.c
@@ -155,7 +155,7 @@ static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct inode *inode,

while (index < end) {
struct folio *folio;
- unsigned int order;
+ pgoff_t ntails;
struct page *page;
kvm_pfn_t pfn;

@@ -168,9 +168,9 @@ static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct inode *inode,

page = folio_file_page(folio, index);
pfn = page_to_pfn(page);
- order = folio_order(folio);
+ ntails = folio_nr_pages(folio) - folio_page_idx(folio, page);

- kvm_arch_gmem_invalidate(kvm, pfn, pfn + min((1ul << order), end - index));
+ kvm_arch_gmem_invalidate(kvm, pfn, pfn + min(ntails, end - index));

index = folio_next_index(folio);
folio_unlock(folio);

Thanks,
Yilun