Re: [PATCH 15/18] sched: Set preferred NUMA node based on number ofprivate faults

From: Peter Zijlstra
Date: Wed Jul 31 2013 - 05:34:51 EST


On Wed, Jul 31, 2013 at 10:29:38AM +0100, Mel Gorman wrote:
> > Hurmph I just stumbled upon this PMD 'trick' and I'm not at all sure I
> > like it. If an application would pre-fault/initialize its memory with
> > the main thread we'll collapse it into a PMDs and forever thereafter (by
> > virtue of do_pmd_numa_page()) they'll all stay the same. Resulting in
> > PMD granularity.
> >
>
> Potentially yes. When that PMD trick was introduced it was because the cost
> of faults was very high due to a high scanning rate. The trick mitigated
> worse-case scenarios until faults were properly accounted for and the scan
> rates were better controlled. As these *should* be addressed by the series
> I think I will be adding a patch to kick away this PMD crutch and see how
> it looks in profiles.

I've been thinking on this a bit and I think we should split these and
thp pages when we get shared faults from different nodes on them and
refuse thp collapses when the pages are on different nodes.

With the exception that when we introduce the interleave mempolicies we
should define 'different node' as being outside of the interleave mask.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/