Re: [RFC PATCH 1/8] mm: mmap: unmap large mapping by section

From: Yang Shi
Date: Thu Mar 22 2018 - 12:49:24 EST




On 3/22/18 9:05 AM, Matthew Wilcox wrote:
On Thu, Mar 22, 2018 at 04:54:52PM +0100, Laurent Dufour wrote:
On 22/03/2018 16:40, Matthew Wilcox wrote:
On Thu, Mar 22, 2018 at 04:32:00PM +0100, Laurent Dufour wrote:
Regarding the page fault, why not relying on the PTE locking ?

When munmap() will unset the PTE it will have to held the PTE lock, so this
will serialize the access.
If the page fault occurs before the mmap(MAP_FIXED), the page mapped will be
removed when mmap(MAP_FIXED) would do the cleanup. Fair enough.
The page fault handler will walk the VMA tree to find the correct
VMA and then find that the VMA is marked as deleted. If it assumes
that the VMA has been deleted because of munmap(), then it can raise
SIGSEGV immediately. But if the VMA is marked as deleted because of
mmap(MAP_FIXED), it must wait until the new VMA is in place.
I'm wondering if such a complexity is required.
If the user space process try to access the page being overwritten through
mmap(MAP_FIXED) by another thread, there is no guarantee that it will
manipulate the *old* page or *new* one.
Right; but it must return one or the other, it can't segfault.

I'd think this is up to the user process to handle that concurrency.
What needs to be guaranteed is that once mmap(MAP_FIXED) returns the old page
are no more there, which is done through the mmap_sem and PTE locking.
Yes, and allowing the fault handler to return the *old* page risks the
old page being reinserted into the page tables after the unmapping task
has done its work.

It's *really* rare to page-fault on a VMA which is in the middle of
being replaced. Why are you trying to optimise it?

I think I was wrong to describe VMAs as being *deleted*. I think we
instead need the concept of a *locked* VMA that page faults will block on.
Conceptually, it's a per-VMA rwsem, but I'd use a completion instead of
an rwsem since the only reason to write-lock the VMA is because it is
being deleted.
Such a lock would only makes sense in the case of mmap(MAP_FIXED) since when
the VMA is removed there is no need to wait. Isn't it ?
I can't think of another reason. I suppose we could mark the VMA as
locked-for-deletion or locked-for-replacement and have the SIGSEGV happen
early. But I'm not sure that optimising for SIGSEGVs is a worthwhile
use of our time. Just always have the pagefault sleep for a deleted VMA.

It sounds worth to me. If we have every page fault sleep to wait for vma deletion is done, it sounds equal to wait for mmap_sem all the time, right?

Yang