If a user wishes to enable KSM mergeability for an entire process and all
fork/exec'd processes that come after it, they use the prctl()
PR_SET_MEMORY_MERGE operation.
This defaults all newly mapped VMAs to have the VM_MERGEABLE VMA flag set
(in order to indicate they are KSM mergeable), as well as setting this flag
for all existing VMAs.
However it also entirely and completely breaks VMA merging for the process
and all forked (and fork/exec'd) processes.
This is because when a new mapping is proposed, the flags specified will
never have VM_MERGEABLE set. However all adjacent VMAs will already have
VM_MERGEABLE set, rendering VMAs unmergeable by default.
To work around this, we try to set the VM_MERGEABLE flag prior to
attempting a merge. In the case of brk() this can always be done.
However on mmap() things are more complicated - while KSM is not supported
for file-backed mappings, it is supported for MAP_PRIVATE file-backed
mappings.
And these mappings may have deprecated .mmap() callbacks specified which
could, in theory, adjust flags and thus KSM eligiblity.
This is unlikely to cause an issue on merge, as any adjacent file-backed
mappings would already have the same post-.mmap() callback attributes, and
thus would naturally not be merged.
But for the purposes of establishing a VMA as KSM-eligible (as well as
initially scanning the VMA), this is potentially very problematic.
So we check to determine whether this at all possible. If not, we set
VM_MERGEABLE prior to the merge attempt on mmap(), otherwise we retain the
previous behaviour.
When .mmap_prepare() is more widely used, we can remove this precaution.
While this doesn't quite cover all cases, it covers a great many (all
anonymous memory, for instance), meaning we should already see a
significant improvement in VMA mergeability.
Since, when it comes to file-backed mappings (other than shmem) we are
really only interested in MAP_PRIVATE mappings which have an available anon
page by default. Therefore, the VM_SPECIAL restriction makes less sense for
KSM.
In a future series we therefore intend to remove this limitation, which
ought to simplify this implementation. However it makes sense to defer
doing so until a later stage so we can first address this mergeability
issue.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
Fixes: d7597f59d1d3 ("mm: add new api to enable ksm per process") # please no backport!
Reviewed-by: Chengming Zhou <chengming.zhou@xxxxxxxxx>