Re: [PATCH] vmscan: bail out of direct reclaim afterswap_cluster_max pages

From: Andrew Morton
Date: Sat Nov 29 2008 - 02:25:18 EST


On Fri, 28 Nov 2008 06:23:58 -0500 Rik van Riel <riel@xxxxxxxxxx> wrote:

> When the VM is under pressure, it can happen that several direct reclaim
> processes are in the pageout code simultaneously. It also happens that
> the reclaiming processes run into mostly referenced, mapped and dirty
> pages in the first round.
>
> This results in multiple direct reclaim processes having a lower
> pageout priority, which corresponds to a higher target of pages to
> scan.
>
> This in turn can result in each direct reclaim process freeing
> many pages. Together, they can end up freeing way too many pages.
>
> This kicks useful data out of memory (in some cases more than half
> of all memory is swapped out). It also impacts performance by
> keeping tasks stuck in the pageout code for too long.
>
> A 30% improvement in hackbench has been observed with this patch.
>
> The fix is relatively simple: in shrink_zone() we can check how many
> pages we have already freed, direct reclaim tasks break out of the
> scanning loop if they have already freed enough pages and have reached
> a lower priority level.
>
> We do not break out of shrink_zone() when priority == DEF_PRIORITY,
> to ensure that equal pressure is applied to every zone in the common
> case.
>
> However, in order to do this we do need to know how many pages we already
> freed, so move nr_reclaimed into scan_control.
>

Again, it's just awful to make a change which has already be tried and
rejected. Especially as we don't really fully understand why it was
rejected (do we?). The information we seek may well be in the mailing
list archives somewhere.

> + /*
> + * On large memory systems, scan >> priority can become
> + * really large. This is fine for the starting priority;
> + * we want to put equal scanning pressure on each zone.
> + * However, if the VM has a harder time of freeing pages,
> + * with multiple processes reclaiming pages, the total
> + * freeing target can get unreasonably large.
> + */
> + if (sc->nr_reclaimed > sc->swap_cluster_max &&
> + priority < DEF_PRIORITY && !current_is_kswapd())
> + break;

Fingers crossed, it might be that the `priority < DEF_PRIORITY' here
will save our bacon from <whatever it was>. But it sure would be good
to know.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/