Re: [PATCH 5/5] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings

From: Jason Gunthorpe
Date: Tue Jun 24 2025 - 19:40:52 EST


On Tue, Jun 24, 2025 at 04:37:26PM -0400, Peter Xu wrote:
> On Thu, Jun 19, 2025 at 03:40:41PM -0300, Jason Gunthorpe wrote:
> > Even with this new version you have to decide to return PUD_SIZE or
> > bar_size in pci and your same reasoning that PUD_SIZE make sense
> > applies (though I would probably return bar_size and just let the core
> > code cap it to PUD_SIZE)
>
> Yes.
>
> Today I went back to look at this, I was trying to introduce this for
> file_operations:
>
> int (*get_mapping_order)(struct file *, unsigned long, size_t);
>
> It looks almost good, except that it so far has no way to return the
> physical address for further calculation on the alignment.
>
> For THP, VA is always calculated against pgoff not physical address on the
> alignment. I think it's OK for THP, because every 2M THP folio will be
> naturally 2M aligned on the physical address, so it fits when e.g. pgoff=0
> in the calculation of thp_get_unmapped_area_vmflags().
>
> Logically it should even also work for vfio-pci, as long as VFIO keeps
> using the lower 40 bits of the device_fd to represent the bar offset,
> meanwhile it'll also require PCIe spec asking the PCI bars to be mapped
> aligned with bar sizes.
>
> But from an API POV, get_mapping_order() logically should return something
> for further calculation of the alignment to get the VA. pgoff here may not
> always be the right thing to use to align to the VA: after all, pgtable
> mapping is about VA -> PA, the only reasonable and reliable way is to align
> VA to the PA to be mappped, and as an API we shouldn't assume pgoff is
> always aligned to PA address space.

My feeling, and the reason I used the phrase "pgoff aligned address",
is that the owner of the file should already ensure that for the large
PTEs/folios:
pgoff % 2**order == 0
physical % 2**order == 0

So, things like VFIO do need to hand out high alignment pgoffs to make
this work - which it already does.

To me this just keeps thing simpler. I guess if someone comes up with
a case where they really can't get a pgoff alignment and really need a
high order mapping then maybe we can add a new return field of some
kind (pgoff adjustment?) but that is so weird I'd leave it to the
future person to come and justfiy it.

Jason