Re: [PATCH 1/2] mmap.2: clarify MAP_LOCKED semantic

From: Peter Zijlstra
Date: Wed May 11 2016 - 07:07:45 EST




On 05/13/2015 04:38 PM, Michal Hocko wrote:
From: Michal Hocko <mhocko@xxxxxxx>

MAP_LOCKED had a subtly different semantic from mmap(2)+mlock(2) since
it has been introduced.
mlock(2) fails if the memory range cannot get populated to guarantee
that no future major faults will happen on the range. mmap(MAP_LOCKED) on
the other hand silently succeeds even if the range was populated only
partially.

Fixing this subtle difference in the kernel is rather awkward because
the memory population happens after mm locks have been dropped and so
the cleanup before returning failure (munlock) could operate on something
else than the originally mapped area.

E.g. speculative userspace page fault handler catching SEGV and doing
mmap(fault_addr, MAP_FIXED|MAP_LOCKED) might discard portion of a racing
mmap and lead to lost data. Although it is not clear whether such a
usage would be valid, mmap page doesn't explicitly describe requirements
for threaded applications so we cannot exclude this possibility.

This patch makes the semantic of MAP_LOCKED explicit and suggest using
mmap + mlock as the only way to guarantee no later major page faults.


URGH, this really blows chunks. It basically means MAP_LOCKED is pointless cruft and we might as well remove it.

Why not fix it proper?