Re: [PATCH] vmscan: skip freeing memory from zones with lots free

From: Andrew Morton
Date: Sat Nov 29 2008 - 16:58:30 EST


On Sat, 29 Nov 2008 16:35:45 -0500 Rik van Riel <riel@xxxxxxxxxx> wrote:

> Andrew Morton wrote:
> > On Sat, 29 Nov 2008 13:59:21 -0500 Rik van Riel <riel@xxxxxxxxxx> wrote:
> >
> >> Andrew Morton wrote:
> >>> On Sat, 29 Nov 2008 13:41:34 -0500 Rik van Riel <riel@xxxxxxxxxx> wrote:
> >>>
> >>>> Andrew Morton wrote:
> >>>>> On Sat, 29 Nov 2008 12:58:32 -0500 Rik van Riel <riel@xxxxxxxxxx> wrote:
> >>>>>
> >>>>>>> Will this new patch reintroduce the problem which
> >>>>>>> 26e4931632352e3c95a61edac22d12ebb72038fe fixed?
> >>>> No, that problem is already taken care of by the fact that
> >>>> active pages always get deactivated in the current VM,
> >>>> regardless of whether or not they were referenced.
> >>> err, sorry, that was the wrong commit.
> >>> 26e4931632352e3c95a61edac22d12ebb72038fe _introduced_ the problem, as
> >>> predicted in the changelog.
> >>>
> >>> 265b2b8cac1774f5f30c88e0ab8d0bcf794ef7b3 later fixed it up.
> >> The patch I sent in this thread does not do any baling out,
> >> it only skips zones where the number of free pages is more
> >> than 4 times zone->pages_high.
> >
> > But that will have the same effect as baling out. Moreso, in fact.
>
> Kswapd already does the same in balance_pgdat.
>
> Unequal pressure is sometimes desired, because allocation
> pressure is not equal between zones. Having lots of
> lowmem allocations should not lead to gigabytes of swapped
> out highmem. A numactl pinned application should not cause
> memory on other NUMA nodes to be swapped out.
>
> Equal pressure between the zones makes sense when allocation
> pressure is similar.
>
> When allocation pressure is different, we have a choice
> between evicting potentially useful data from memory or
> applying uneven pressure on zones.
>
> >> Equal pressure is still applied to the other zones.
> >>
> >> This should not be a problem since we do not enter direct
> >> reclaim unless the free pages in every zone in our zonelist
> >> are below zone->pages_low.
> >>
> >> Zone skipping is only done by tasks that have been in the
> >> direct reclaim code for a long time.
> >
> >>From 265b2b8cac1774f5f30c88e0ab8d0bcf794ef7b3:
> >
> > We currently have a problem with the balancing of reclaim
> > between zones: much more reclaim happens against highmem than
> > against lowmem.
> >
> > This problem will be reintroduced, will it not?
>
> We already have that behaviour in balance_pgdat().

I expect that was the case back in March 2004.
265b2b8cac1774f5f30c88e0ab8d0bcf794ef7b3 removed the bale-out only for
the direct reclaim path.

> We do not do any reclaim on zones higher than the first
> zone where the zone_watermark_ok call returns true:
>
> if (!zone_watermark_ok(zone, order, zone->pages_high,
> 0, 0)) {
> end_zone = i;
> break;
> }
>
> Further down in balance_pgdat(), we skip reclaiming from zones
> that have way too much memory free.
>
> /*
> * We put equal pressure on every zone, unless one
> * zone has way too many pages free already.
> */
> if (!zone_watermark_ok(zone, order, 8*zone->pages_high,
> end_zone, 0))
> shrink_zone(priority, zone, &sc);
>
> All my patch does is add one of these sanity checks to the
> direct reclaim path.

It's a change in behaviour, not a "sanity check"!

The bottom line here is that we don't fully understand the problem
which 265b2b8cac1774f5f30c88e0ab8d0bcf794ef7b3 fixed, hence we cannot
say whether this proposed change will reintroduce it.

Why did it matter that "much more reclaim happens against highmem than
against lowmem"? What were the observeable effects of this?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/