Re: [PATCH] mm: fix account pmd page to the process

From: Mike Kravetz
Date: Thu Jun 16 2016 - 12:05:45 EST


On 06/16/2016 08:43 AM, Michal Hocko wrote:
> [It seems that this patch has been sent several times and this
> particular copy didn't add Kirill who has added this code CC him now]
>
> On Thu 16-06-16 17:42:14, Michal Hocko wrote:
>> On Thu 16-06-16 19:36:11, zhongjiang wrote:
>>> From: zhong jiang <zhongjiang@xxxxxxxxxx>
>>>
>>> when a process acquire a pmd table shared by other process, we
>>> increase the account to current process. otherwise, a race result
>>> in other tasks have set the pud entry. so it no need to increase it.
>>>
>>> Signed-off-by: zhong jiang <zhongjiang@xxxxxxxxxx>
>>> ---
>>> mm/hugetlb.c | 5 ++---
>>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>>> index 19d0d08..3b025c5 100644
>>> --- a/mm/hugetlb.c
>>> +++ b/mm/hugetlb.c
>>> @@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
>>> if (pud_none(*pud)) {
>>> pud_populate(mm, pud,
>>> (pmd_t *)((unsigned long)spte & PAGE_MASK));
>>> - } else {
>>> + } else
>>> put_page(virt_to_page(spte));
>>> - mm_inc_nr_pmds(mm);
>>> - }
>>
>> The code is quite puzzling but is this correct? Shouldn't we rather do
>> mm_dec_nr_pmds(mm) in that path to undo the previous inc?

I agree that the code is quite puzzling. :(

However, if this were an issue I would have expected to see some reports.
Oracle DB makes use of this feature (shared page tables) and if the pmd
count is wrong we would catch it in check_mm() at exit time.

Upon closer examination, I believe the code in question is never executed.
Note the callers of huge_pmd_share. The calling code looks like:

if (want_pmd_share() && pud_none(*pud))
pte = huge_pmd_share(mm, addr, pud);
else
pte = (pte_t *)pmd_alloc(mm, pud, addr);

Therefore, we do not call huge_pmd_share unless pud_none(*pud). The
code in question is only executed when !pud_none(*pud).

I think that entire if/else statement can be removed. We know
pud_none(*pud), so just do pud_populate().

--
Mike Kravetz

>>
>>> +
>>> spin_unlock(ptl);
>>> out:
>>> pte = (pte_t *)pmd_alloc(mm, pud, addr);
>>> --
>>> 1.8.3.1
>>
>> --
>> Michal Hocko
>> SUSE Labs
>