Re: [PATCH 3/4] mm: prevent KSM from completely breaking VMA merging

From: David Hildenbrand
Date: Mon May 19 2025 - 15:29:22 EST


On 19.05.25 21:26, Lorenzo Stoakes wrote:
On Mon, May 19, 2025 at 09:11:10PM +0200, David Hildenbrand wrote:
On 19.05.25 21:02, Lorenzo Stoakes wrote:
On Mon, May 19, 2025 at 08:04:22PM +0200, David Hildenbrand wrote:

+/*
+ * Are we guaranteed no driver can change state such as to preclude KSM merging?
+ * If so, let's set the KSM mergeable flag early so we don't break VMA merging.
+ *
+ * This is applicable when PR_SET_MEMORY_MERGE has been set on the mm_struct via
+ * prctl() causing newly mapped VMAs to have the KSM mergeable VMA flag set.
+ *
+ * If this is not the case, then we set the flag after considering mergeability,
+ * which will prevent mergeability as, when PR_SET_MEMORY_MERGE is set, a new
+ * VMA will not have the KSM mergeability VMA flag set, but all other VMAs will,
+ * preventing any merge.

Hmmm, so an ordinary MAP_PRIVATE of any file (executable etc.) will get
VM_MERGEABLE set but not be able to merge?

Probably these are not often expected to be merged ...

Preventing merging should really only happen because of VMA flags that
are getting set: VM_PFNMAP, VM_MIXEDMAP, VM_DONTEXPAND, VM_IO.


I am not 100% sure why we bail out on special mappings: all we have to
do is reliably identify anon pages, and we should be able to do that.

GUP does currently refuses any VM_PFNMAP | VM_IO, and KSM uses GUP,
which might need a tweak then (maybe the solution could be to ... not
use GUP but a folio_walk).

Oh, someone called "David" already did that. Nice :)

So we *should* be able to drop

* VM_PFNMAP: we correctly identify CoWed pages
* VM_MIXEDMAP: we correctly identify CoWed pages
* VM_IO: should not affect CoWed pages
* VM_DONTEXPAND: no idea why that should even matter here

I objected in the other thread but now realise I forgot we're talking about
MAP_PRIVATE... So we can do the CoW etc. Right.

Then we just need to be able to copy the thing on CoW... but what about
write-through etc. cache settings? I suppose we don't care once CoW'd...

Yes. It's ordinary kernel-managed memory.

Yeah, it's the CoW'd bit right? So it's fine.



But is this common enough of a use case to be worth the hassle of checking this
is all ok?

The reason I bring it up is because

1) Just because some drivers do weird mmap() things, we cannot merge any
MAP_PRIVATE file mappings (except shmem ;) and mmap_prepare).

2) The whole "early_ksm" checks/handling would go away, making this patch
significantly simpler :)

OK you're starting to convince me...

Maybe this isn't such a big deal if the KSM code handles it already anwyay.

If you're sure GUP isn't relying on this... it could be an additional patch
like:

'remove VM_SPECIAL limitation, KSM can already handle this'

And probably we _should_ let any insane driver blow up if they change stupid
things they must not change.

That is exactly my thinking.

But I'm also fine with doing that later, if you want to be careful.

--
Cheers,

David / dhildenb