Re: [PATCH v1 3/4] mm: split folio_pte_batch() into folio_pte_batch() and folio_pte_batch_ext()

From: Lance Yang
Date: Fri Jun 27 2025 - 11:45:38 EST




On 2025/6/27 23:09, David Hildenbrand wrote:
On 27.06.25 16:19, Lance Yang wrote:
On Fri, Jun 27, 2025 at 7:55 PM David Hildenbrand <david@xxxxxxxxxx> wrote:

Many users (including upcoming ones) don't really need the flags etc,
and can live with a function call.

So let's provide a basic, non-inlined folio_pte_batch().

In zap_present_ptes(), where we care about performance, the compiler
already seem to generate a call to a common inlined folio_pte_batch()
variant, shared with fork() code. So calling the new non-inlined variant
should not make a difference.

It's always an interesting dance with the compiler when it comes to inlining,
isn't it? We want the speed of 'inline' for critical paths, but also a compact
binary for the common case ...

This split is a nice solution to the classic 'inline' vs. code size dilemma ;p

Yeah, in particular when we primarily care about optimizing out all the unnecessary checks inside the function, not necessarily also inlining the function call itself.

If we ever realize we absolute must inline it into a caller, we could turn folio_pte_batch_ext() into an "__always_inline", but for now it does not seem like this is really required from my experiments.

Right, that makes sense. No need to force "__always_inline" prematurely.