Re: [PATCH v4 2/3] gup: introduce unpin_user_folio_dirty_locked()
From: Jason Gunthorpe
Date: Tue Jun 17 2025 - 11:23:00 EST
On Tue, Jun 17, 2025 at 04:04:26PM +0200, David Hildenbrand wrote:
> On 17.06.25 15:58, David Hildenbrand wrote:
> > On 17.06.25 15:45, David Hildenbrand wrote:
> > > On 17.06.25 15:42, Jason Gunthorpe wrote:
> > > > On Tue, Jun 17, 2025 at 12:18:20PM +0800, lizhe.67@xxxxxxxxxxxxx wrote:
> > > >
> > > > > @@ -360,12 +360,7 @@ void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages,
> > > > > for (i = 0; i < npages; i += nr) {
> > > > > folio = gup_folio_range_next(page, npages, i, &nr);
> > > > > - if (make_dirty && !folio_test_dirty(folio)) {
> > > > > - folio_lock(folio);
> > > > > - folio_mark_dirty(folio);
> > > > > - folio_unlock(folio);
> > > > > - }
> > > > > - gup_put_folio(folio, nr, FOLL_PIN);
> > > > > + unpin_user_folio_dirty_locked(folio, nr, make_dirty);
> > > > > }
> > > >
> > > > I don't think we should call an exported function here - this is a
> > > > fast path for rdma and iommfd, I don't want to see it degrade to save
> > > > three duplicated lines :\
> > >
> > > Any way to quantify? In theory, the compiler could still optimize this
> > > within the same file, no?
> >
> > Looking at the compiler output, I think the compile is doing exactly that.
> >
> > Unless my obdjump -D -S analysis skills are seriously degraded :)
>
> FWIW, while already looking at this, even before this change, the compiler
> does not inline gup_put_folio() into this function, which is a bit
> unexpected.
Weird, but I would not expect this as a general rule, not sure we
should rely on it.
I would say exported function should not get automatically
inlined. That throws all the kprobes into chaos :\
BTW, why can't the other patches in this series just use
unpin_user_page_range_dirty_lock? The way this stuff is supposed to
work is to combine adjacent physical addresses and then invoke
unpin_user_page_range_dirty_lock() on the start page of the physical
range. This is why we have the gup_folio_range_next() which does the
segmentation in an efficient way.
Combining adjacent physical is basically free math.
Segmenting to folios in the vfio side doesn't make a lot of sense,
IMHO.
Jason