Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings
From: Jason Gunthorpe
Date: Wed Jul 02 2025 - 19:33:21 EST
On Wed, Jul 02, 2025 at 04:58:46PM -0400, Peter Xu wrote:
> > So you have to do it the other way and pass the pgoff to the vmap so
> > the vmap ends up with the same colouring as a user VMa holding the
> > same pages..
>
> Not sure if I get that point, but.. it'll be hard to achieve at least.
>
> The vmap() happens (submit/complete queues initializes) when io_uring
> instance is created. The mmap() happens later, and it can also happen
> multiple times, so that all of the VAs got mmap()ed need to share the same
> colouring with the vmap().. In this case it sounds reasonable to me to
> have the alignment done at mmap(), against the vmap() results.
The way this usually works is the memory is bound to a mmap "cookie"
- the pgoff - which userspace can use as many times as it likes.
Usually you know the thing being allocated will be mmap'd and what
it's pgoff will be because it is 1:1 with the cookie/pgoff.
Didn't try to guess what io_uring has done here, but, IMHO, it would
be weird if the pgoffs are not 1:1 with the vmaps.
Since you said the pgoff was constant and not exchanged user/kernel
then presumably the vmap just needs to use that constant pgoff for its
colouring.
> > > The changes comparing to previous:
> > >
> > > (1) merged pgoff and *phys_pgoff parameters into one unsigned long, so
> > > the hook can adjust the pgoff for the va allocator to be used. The
> > > adjustment will not be visible to future mmap() when VMA is created.
> >
> > It seems functional, but the above is better, IMHO.
>
> Do you mean we can start with no modification allowed on *pgoff? I'd
> prefer having *pgoff modifiable from the start, as it'll not only work for
> io_uring / parisc above since the 1st day (so we don't need to introduce it
> on top, modifying existing users..), but it'll also be cleaner to be used
> in the current VFIO's use case.
I think modifiably pgoff is really a weird concept... Especially if it
is only modified for the alignment calculation.
But if it is the only way so be it
Jason