Re: [BUG] fatal hang untarring 90GB file, possibly writebackrelated.

From: Colin Ian King
Date: Thu Apr 28 2011 - 13:11:08 EST


On Thu, 2011-04-28 at 16:08 +0100, Mel Gorman wrote:

[ text deleted ]

> Another consequence of this patch is that when high order allocations
> are in progress (is the test case fork heavy in any way for
> example? alternatively, it might be something in the storage stack
> that requires high-order allocs) we are no longer necessarily going
> to sleep because of should_reclaim_continue() check. This could
> explain kswapd-at-99% but would only apply if CONFIG_COMPACTION is
> set (does unsetting CONFIG_COMPACTION help). If the bug only triggers
> for CONFIG_COMPACTION, does the following *untested* patch help any?

Afraid to report this patch didn't help either.
>
> (as a warning, I'm offline Friday until Tuesday morning. I'll try
> check mail over the weekend but it's unlikely I'll find a terminal
> or be allowed to use it without an ass kicking)

Ditto, me, to, I will pick this up Tuesday.
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 148c6e6..c74a501 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1842,15 +1842,22 @@ static inline bool should_continue_reclaim(struct zone *zone,
> return false;
>
> /*
> - * If we failed to reclaim and have scanned the full list, stop.
> - * NOTE: Checking just nr_reclaimed would exit reclaim/compaction far
> - * faster but obviously would be less likely to succeed
> - * allocation. If this is desirable, use GFP_REPEAT to decide
> - * if both reclaimed and scanned should be checked or just
> - * reclaimed
> + * For direct reclaimers
> + * If we failed to reclaim and have scanned the full list, stop.
> + * The caller will check congestion and sleep if necessary until
> + * some IO completes.
> + * For kswapd
> + * Check just nr_reclaimed. If we are failing to reclaim, we
> + * want to stop this reclaim loop, increase the priority and
> + * go to sleep if necessary to allow IO a change to complete.
> + * This avoids kswapd going into a busy loop in shrink_zone()
> */
> - if (!nr_reclaimed && !nr_scanned)
> - return false;
> + if (!nr_reclaimed) {
> + if (current_is_kswapd())
> + return false;
> + else if (!nr_scanned)
> + return false;
> + }
>
> /*
> * If we have not reclaimed enough pages for compaction and the
> @@ -1924,8 +1931,13 @@ restart:
>
> /* reclaim/compaction might need reclaim to continue */
> if (should_continue_reclaim(zone, nr_reclaimed,
> - sc->nr_scanned - nr_scanned, sc))
> + sc->nr_scanned - nr_scanned, sc)) {
> + /* Throttle direct reclaimers if congested */
> + if (!current_is_kswapd())
> + wait_iff_congested(zone, BLK_RW_ASYNC, HZ/10);
> +
> goto restart;
> + }
>
> throttle_vm_writeout(sc->gfp_mask);
> }


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/