Re: [RFC PATCH 1/8] mm: mmap: unmap large mapping by section

From: Matthew Wilcox
Date: Thu Mar 22 2018 - 11:41:05 EST


On Thu, Mar 22, 2018 at 04:32:00PM +0100, Laurent Dufour wrote:
> On 21/03/2018 23:46, Matthew Wilcox wrote:
> > On Wed, Mar 21, 2018 at 02:45:44PM -0700, Yang Shi wrote:
> >> Marking vma as deleted sounds good. The problem for my current approach is
> >> the concurrent page fault may succeed if it access the not yet unmapped
> >> section. Marking deleted vma could tell page fault the vma is not valid
> >> anymore, then return SIGSEGV.
> >>
> >>> does not care; munmap will need to wait for the existing munmap operation
> >>
> >> Why mmap doesn't care? How about MAP_FIXED? It may fail unexpectedly, right?
> >
> > The other thing about MAP_FIXED that we'll need to handle is unmapping
> > conflicts atomically. Say a program has a 200GB mapping and then
> > mmap(MAP_FIXED) another 200GB region on top of it. So I think page faults
> > are also going to have to wait for deleted vmas (then retry the fault)
> > rather than immediately raising SIGSEGV.
>
> Regarding the page fault, why not relying on the PTE locking ?
>
> When munmap() will unset the PTE it will have to held the PTE lock, so this
> will serialize the access.
> If the page fault occurs before the mmap(MAP_FIXED), the page mapped will be
> removed when mmap(MAP_FIXED) would do the cleanup. Fair enough.

The page fault handler will walk the VMA tree to find the correct
VMA and then find that the VMA is marked as deleted. If it assumes
that the VMA has been deleted because of munmap(), then it can raise
SIGSEGV immediately. But if the VMA is marked as deleted because of
mmap(MAP_FIXED), it must wait until the new VMA is in place.

I think I was wrong to describe VMAs as being *deleted*. I think we
instead need the concept of a *locked* VMA that page faults will block on.
Conceptually, it's a per-VMA rwsem, but I'd use a completion instead of
an rwsem since the only reason to write-lock the VMA is because it is
being deleted.