Re: [PATCH] mm, numa: Fix bad pmd by atomically check for pmd_trans_huge when marking page tables prot_numa

From: Mel Gorman
Date: Tue Apr 11 2017 - 04:29:24 EST


On Mon, Apr 10, 2017 at 03:09:03PM -0700, Andrew Morton wrote:
> On Mon, 10 Apr 2017 19:07:14 +0100 Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:
>
> > On Mon, Apr 10, 2017 at 12:49:40PM -0500, Zi Yan wrote:
> > > On 10 Apr 2017, at 12:20, Mel Gorman wrote:
> > >
> > > > On Mon, Apr 10, 2017 at 11:45:08AM -0500, Zi Yan wrote:
> > > >>> While this could be fixed with heavy locking, it's only necessary to
> > > >>> make a copy of the PMD on the stack during change_pmd_range and avoid
> > > >>> races. A new helper is created for this as the check if quite subtle and the
> > > >>> existing similar helpful is not suitable. This passed 154 hours of testing
> > > >>> (usually triggers between 20 minutes and 24 hours) without detecting bad
> > > >>> PMDs or corruption. A basic test of an autonuma-intensive workload showed
> > > >>> no significant change in behaviour.
> > > >>>
> > > >>> Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
> > > >>> Cc: stable@xxxxxxxxxxxxxxx
> > > >>
> > > >> Does this patch fix the same problem fixed by Kirill's patch here?
> > > >> https://lkml.org/lkml/2017/3/2/347
> > > >>
> > > >
> > > > I don't think so. The race I'm concerned with is due to locks not being
> > > > held and is in a different path.
> > >
> > > I do not agree. Kirill's patch is fixing the same race problem but in
> > > zap_pmd_range().
> > >
> > > The original autoNUMA code first clears PMD then sets it to protnone entry.
> > > pmd_trans_huge() does not return TRUE because it saw cleared PMD, but
> > > pmd_none_or_clear_bad() later saw the protnone entry and reported it as bad.
> > > Is this the problem you are trying solve?
> > >
> > > Kirill's patch will pmdp_invalidate() the PMD entry, which keeps _PAGE_PSE bit,
> > > so pmd_trans_huge() will return TRUE. In this case, it also fixes
> > > your race problem in change_pmd_range().
> > >
> > > Let me know if I miss anything.
> > >
> >
> > Ok, now I see. I think you're correct and I withdraw the patch.
>
> I have Kirrill's
>
> thp-reduce-indentation-level-in-change_huge_pmd.patch
> thp-fix-madv_dontneed-vs-numa-balancing-race.patch
> mm-drop-unused-pmdp_huge_get_and_clear_notify.patch
> thp-fix-madv_dontneed-vs-madv_free-race.patch
> thp-fix-madv_dontneed-vs-madv_free-race-fix.patch
> thp-fix-madv_dontneed-vs-clear-soft-dirty-race.patch
>
> scheduled for 4.12-rc1. It sounds like
> thp-fix-madv_dontneed-vs-madv_free-race.patch and
> thp-fix-madv_dontneed-vs-madv_free-race.patch need to be boosted to
> 4.11 and stable?

Arguably all of them deal with different classes of race. The first two
should be tagged for any stable kernel after 3.15 because that's the
only one I know for certain occurs in the field albeit not on a mainline
kernel. There will be conflicts on older kernels due to changes in the
PMD locking API and it'll be up to the tree maintainers and patch owners
if they want to backport or not.

--
Mel Gorman
SUSE Labs