Re: mm: fix BUG in __split_huge_page_pmd

From: Naoya Horiguchi
Date: Tue Oct 15 2013 - 16:16:38 EST


On Tue, Oct 15, 2013 at 09:44:28PM +0200, Andrea Arcangeli wrote:
> On Tue, Oct 15, 2013 at 03:28:50PM -0400, Naoya Horiguchi wrote:
> > On Tue, Oct 15, 2013 at 08:55:10PM +0200, Andrea Arcangeli wrote:
> > > On Tue, Oct 15, 2013 at 10:53:10AM -0700, Hugh Dickins wrote:
> > > > I'm afraid Andrea's mail about concurrent madvises gives me far more
> > > > to think about than I have time for: seems to get into problems he
> > > > knows a lot about but I'm unfamiliar with. If this patch looks good
> > > > for now on its own, let's put it in; but no problem if you guys prefer
> > > > to wait for a fuller solution of more problems, we can ride with this
> > > > one internally for the moment.
> > >
> > > I'm very happy with the patch and I think it's a correct fix for the
> > > COW scenario which is deterministic so the looping makes a meaningful
> > > difference for it. If we wouldn't loop, part of the copied page
> > > wouldn't be zapped after the COW.
> >
> > I like this patch, too.
> >
> > If we have the loop in __split_huge_page_pmd as suggested in this patch,
> > can we assume that the pmd is stable after __split_huge_page_pmd returns?
> > If it's true, we can remove pmd_none_or_trans_huge_or_clear_bad check
> > in the callers side (zap_pmd_range and some other page table walking code.)
>
> We can assume it stable for the deterministic cases where the
> looping is useful for and split_huge_page creates non-huge pmd that points to
> a regular pte.
>
> But we cannot remove pmd_none_or_trans_huge_or_clear_bad after if for
> the other non deterministic cases that I described in previous
> email. Looping still provides no guarantee that when the function
> returns, the pmd in not huge. So for safety we still need to handle
> the non deterministic case and just discard it through
> pmd_none_or_trans_huge_or_clear_bad.

OK, this check is necessary. But pmd_none_or_trans_huge_or_clear_bad
doesn't clear the pmd when pmd_trans_huge is true. So zap_pmd_range
seems to do nothing on such irregular pmd_trans_huge. So it looks to
me better that zap_pmd_range retries the loop on the same address
instead of 'goto next'.
The reason why I had this kind of question is that I recently study on
page table walker and some related code do retry in the similar situation.

Thanks,
Naoya Horiguchi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/