Re: [PATCH] hugetlbfs: Take read_lock on i_mmap for PMD sharing

From: Mike Kravetz
Date: Fri Nov 08 2019 - 20:47:55 EST


On 11/8/19 11:10 AM, Mike Kravetz wrote:
> On 11/7/19 6:04 PM, Davidlohr Bueso wrote:
>> On Thu, 07 Nov 2019, Mike Kravetz wrote:
>>
>>> Note that huge_pmd_share now increments the page count with the semaphore
>>> held just in read mode. It is OK to do increments in parallel without
>>> synchronization. However, we don't want anyone else changing the count
>>> while that check in huge_pmd_unshare is happening. Hence, the need for
>>> taking the semaphore in write mode.
>>
>> This would be a nice addition to the changelog methinks.
>
> Last night I remembered there is one place where we currently take
> i_mmap_rwsem in read mode and potentially call huge_pmd_unshare. That
> is in try_to_unmap_one. Yes, there is a potential race here today.

Actually there is no race there today. Callers to huge_pmd_unshare
hold the page table lock. So, this synchronizes those unshare calls
from page migration and page poisoning.

> But that race is somewhat contained as you need two threads doing some
> combination of page migration and page poisoning to race. This change
> now allows migration or poisoning to race with page fault. I would
> really prefer if we do not open up the race window in this manner.

But, we do open a race window by changing huge_pmd_share to take the
i_mmap_rwsem in read mode as in the original patch.

Here is the additional code needed to take the semaphore in write mode
for the huge_pmd_unshare calls via try_to_unmap_one. We would need to
combine this with Longman's patch. Please take a look and provide feedback.
Some of the changes are subtle, especially the exception for MAP_PRIVATE
mappings, but I tried to add sufficient comments.