On 08/08/2025 08:45, David Hildenbrand wrote:
ACK; Dev, perhaps you can take another look at this and work up a patch to moreNot sure if some sleep has changed your mind on what "hint" means? I'm prettyThe last one is the important bit I think.
sure David named this function, but for me the name makes sense. The arch is
saying "I know that the pte batch is at least N ptes. It's up to you if you use
that information. I'll still work correctly if you ignore it".
For me, your interpretation of 'the most number of PTEs that _might_ coalesce'I'm not a native speaker, so I'll let both of you figure that out. To me it
would be a guess, not a hint.
makes sense as well ... but well, I was involved when creating that function. :)
It's actually ... surprisingly good after reading it again after at least a year.I understand the con PTE bit is a 'hint' but as I recall you saying atFWIW, this is the documentation for the function:
LSF/MM 'modern CPUs take the hint'. Which presumably is where this comes
from, but that's kinda deceptive.
Anyway the reason I was emphatic here is on the basis that I believe I had
this explained to met his way, which obviously I or whoever it was (don't
recall) must have misunderstood. Or perhaps I hallucinated it... :)
/**
* pte_batch_hint - Number of pages that can be added to batch without scanning.
* @ptep: Page table pointer for the entry.
* @pte: Page table entry.
*
* Some architectures know that a set of contiguous ptes all map the same
* contiguous memory with the same permissions. In this case, it can provide a
* hint to aid pte batching without the core code needing to scan every pte.
*
* An architecture implementation may ignore the PTE accessed state. Further,
* the dirty state must apply atomically to all the PTEs described by the hint.
*
* May be overridden by the architecture, else pte_batch_hint is always 1.
*/
[...]I see that folio_pte_batch() can get _more_, is this on the basis of there
being adjacent, physically contiguous contPTE entries that can also be
batched up?
I am probably to blame here, because I think I rejected early to have arm64-onlyI think for mprotect, the folio was only previously needed for the numa case. Improtect didn't? I mean let's check.IIRC prior to Dev's mprotect and mremap optimizations, I believe all sitesYup... I wonder about the other instances of this... ruh roh.
Not sure if that was discussed at some point before we went into the
direction of using folios. But there really doesn't seem to be anything
gained for other architectures here (as raised by Jann).
already needed the folio. I haven't actually looked at how mprotect ended up,
but maybe worth checking to see if it should protect with pte_batch_hint() too.
have a vague memory that either Dev of I proposed wrapping folio_pte_batch() to
only get the folio and call it if the next PTE had an adjacent PFN (or something
like that). But it was deemed to complex. I might be misremembering... could
have been an internal conversation. I'll chat with Dev about it and revisit.
optimization, assuming other arch could benefit here as well with batching. But
as it seems, batching in mremap() code really only serves the cont-pte managing
code, and the folio_pte_batch() is really entirely unnecessary.
In case of mprotect(), I think really only (a) NUMA and (b) anon-folio write-
upgrade required the folio. So it's a bit more tricky than mremap() here
where ... the folio is entirely irrelevant.
One could detect the "anon write-upgrade possible" case early as well, and only
lookup the folio in that case, otherwise use the straight pte hint.
So I think there is some room for further improvement.
agressively avoid vm_normal_folio() for mprotect?
Thanks,
Ryan