Re: [PATCH] mm: fix account pmd page to the process

From: Michal Hocko
Date: Thu Jun 16 2016 - 12:31:27 EST


On Thu 16-06-16 09:05:23, Mike Kravetz wrote:
> On 06/16/2016 08:43 AM, Michal Hocko wrote:
> > [It seems that this patch has been sent several times and this
> > particular copy didn't add Kirill who has added this code CC him now]
> >
> > On Thu 16-06-16 17:42:14, Michal Hocko wrote:
> >> On Thu 16-06-16 19:36:11, zhongjiang wrote:
> >>> From: zhong jiang <zhongjiang@xxxxxxxxxx>
> >>>
> >>> when a process acquire a pmd table shared by other process, we
> >>> increase the account to current process. otherwise, a race result
> >>> in other tasks have set the pud entry. so it no need to increase it.
> >>>
> >>> Signed-off-by: zhong jiang <zhongjiang@xxxxxxxxxx>
> >>> ---
> >>> mm/hugetlb.c | 5 ++---
> >>> 1 file changed, 2 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >>> index 19d0d08..3b025c5 100644
> >>> --- a/mm/hugetlb.c
> >>> +++ b/mm/hugetlb.c
> >>> @@ -4189,10 +4189,9 @@ pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud)
> >>> if (pud_none(*pud)) {
> >>> pud_populate(mm, pud,
> >>> (pmd_t *)((unsigned long)spte & PAGE_MASK));
> >>> - } else {
> >>> + } else
> >>> put_page(virt_to_page(spte));
> >>> - mm_inc_nr_pmds(mm);
> >>> - }
> >>
> >> The code is quite puzzling but is this correct? Shouldn't we rather do
> >> mm_dec_nr_pmds(mm) in that path to undo the previous inc?
>
> I agree that the code is quite puzzling. :(
>
> However, if this were an issue I would have expected to see some reports.
> Oracle DB makes use of this feature (shared page tables) and if the pmd
> count is wrong we would catch it in check_mm() at exit time.
>
> Upon closer examination, I believe the code in question is never executed.
> Note the callers of huge_pmd_share. The calling code looks like:
>
> if (want_pmd_share() && pud_none(*pud))
> pte = huge_pmd_share(mm, addr, pud);
> else
> pte = (pte_t *)pmd_alloc(mm, pud, addr);
>
> Therefore, we do not call huge_pmd_share unless pud_none(*pud). The
> code in question is only executed when !pud_none(*pud).

My understanding is that the check is needed after we retake page lock
because we might have raced with other thread. But it's been quite some
time since I've looked at hugetlb locking and page table sharing code.

--
Michal Hocko
SUSE Labs