Re: [PATCH v4 2/3] gup: introduce unpin_user_folio_dirty_locked()

From: Jason Gunthorpe
Date: Wed Jun 18 2025 - 09:24:08 EST


On Wed, Jun 18, 2025 at 08:19:28PM +0800, lizhe.67@xxxxxxxxxxxxx wrote:
> On Wed, 18 Jun 2025 08:56:22 -0300, jgg@xxxxxxxx wrote:
>
> > On Wed, Jun 18, 2025 at 01:52:37PM +0200, David Hildenbrand wrote:
> >
> > > I thought we also wanted to optimize out the
> > > is_invalid_reserved_pfn() check for each subpage of a folio.
>
> Yes, that is an important aspect of our optimization.
>
> > VFIO keeps a tracking structure for the ranges, you can record there
> > if a reserved PFN was ever placed into this range and skip the check
> > entirely.
> >
> > It would be very rare for reserved PFNs and non reserved will to be
> > mixed within the same range, userspace could cause this but nothing
> > should.
>
> Yes, but it seems we don't have a very straightforward interface to
> obtain the reserved attribute of this large range of pfns.

vfio_unmap_unpin() has the struct vfio_dma, you'd store the
indication there and pass it down.

It already builds the longest run of physical contiguity here:

for (len = PAGE_SIZE; iova + len < end; len += PAGE_SIZE) {
next = iommu_iova_to_phys(domain->domain, iova + len);
if (next != phys + len)
break;
}

And we pass down a physically contiguous range to
unmap_unpin_fast()/unmap_unpin_slow().

The only thing you need to do is to detect reserved in
vfio_unmap_unpin() optimized flag in the dma, and break up the above
loop if it crosses a reserved boundary.

If you have a reserved range then just directly call iommu_unmap and
forget about any page pinning.

Then in the page pinning side you use the range version.

Something very approximately like the below. But again, I would
implore you to just use iommufd that is already much better here.

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 1136d7ac6b597e..097b97c67e3f0d 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -738,12 +738,13 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
long unlocked = 0, locked = 0;
long i;

+ /* The caller has already ensured the pfn range is not reserved */
+ unpin_user_page_range_dirty_lock(pfn_to_page(pfn), npage,
+ dma->prot & IOMMU_WRITE);
for (i = 0; i < npage; i++, iova += PAGE_SIZE) {
- if (put_pfn(pfn++, dma->prot)) {
unlocked++;
if (vfio_find_vpfn(dma, iova))
locked++;
- }
}

if (do_accounting)
@@ -1082,6 +1083,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
while (iova < end) {
size_t unmapped, len;
phys_addr_t phys, next;
+ bool reserved = false;

phys = iommu_iova_to_phys(domain->domain, iova);
if (WARN_ON(!phys)) {
@@ -1089,6 +1091,9 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
continue;
}

+ if (dma->has_reserved)
+ reserved = is_invalid_reserved_pfn(phys >> PAGE_SHIFT);
+
/*
* To optimize for fewer iommu_unmap() calls, each of which
* may require hardware cache flushing, try to find the
@@ -1098,21 +1103,31 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
next = iommu_iova_to_phys(domain->domain, iova + len);
if (next != phys + len)
break;
+ if (dma->has_reserved &&
+ reserved != is_invalid_reserved_pfn(next >> PAGE_SHIFT))
+ break;
}

/*
* First, try to use fast unmap/unpin. In case of failure,
* switch to slow unmap/unpin path.
*/
- unmapped = unmap_unpin_fast(domain, dma, &iova, len, phys,
- &unlocked, &unmapped_region_list,
- &unmapped_region_cnt,
- &iotlb_gather);
- if (!unmapped) {
- unmapped = unmap_unpin_slow(domain, dma, &iova, len,
- phys, &unlocked);
- if (WARN_ON(!unmapped))
- break;
+ if (reserved) {
+ unmapped = iommu_unmap(domain->domain, iova, len);
+ *iova += unmapped;
+ } else {
+ unmapped = unmap_unpin_fast(domain, dma, &iova, len,
+ phys, &unlocked,
+ &unmapped_region_list,
+ &unmapped_region_cnt,
+ &iotlb_gather);
+ if (!unmapped) {
+ unmapped = unmap_unpin_slow(domain, dma, &iova,
+ len, phys,
+ &unlocked);
+ if (WARN_ON(!unmapped))
+ break;
+ }
}
}