Re: mm, vmscan: commit makes PAE kernel crash nightly (bisected)

From: Mel Gorman
Date: Tue Jan 17 2017 - 09:21:21 EST


On Tue, Jan 17, 2017 at 02:52:28PM +0100, Michal Hocko wrote:
> On Mon 16-01-17 11:09:34, Mel Gorman wrote:
> [...]
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 532a2a750952..46aac487b89a 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2684,6 +2684,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
> > continue;
> >
> > if (sc->priority != DEF_PRIORITY &&
> > + !buffer_heads_over_limit &&
> > !pgdat_reclaimable(zone->zone_pgdat))
> > continue; /* Let kswapd poll it */
>
> I think we should rather remove pgdat_reclaimable here. This sounds like
> a wrong layer to decide whether we want to reclaim and how much.
>

I had considered that but it'd also be important to add the other 32-bit
patches you have posted to see the impact. Because of the ratio of LRU pages
to slab pages, it may not have an impact but it'd need to be eliminated.

> But even that won't help very much I am afraid. As I've noted in the
> other response as long as we will scale the slab shrinking based on
> nr_scanned we will have a problem with situations where slab outnumbers
> lru lists too much. I do not have a good idea how to fix that though...
>

Right now, I don't either other than a heavy-handed approach of checking if
a) it's a pgdat with a highmem node b) if the ratio of LRU pages to slab
pages on the lower zones is out of whack and if so, ignore nr_scanned for
the slab shrinker.

Before prototyping such a thing, I'd like to hear the outcome of this
heavy hack and then add your 32-bit patches onto the list. If the problem
is still there then I'd next look at taking slab pages into account in
pgdat_reclaimable() instead of an outright removal that has a much wider
impact. If that doesn't work then I'll prototype a heavy-handed forced
slab reclaim when lower zones are almost all slab pages.

--
Mel Gorman
SUSE Labs