Re: [resend][PATCH] mm, vmscan: fix do_try_to_free_pages() livelock

From: Michal Hocko
Date: Thu Jun 14 2012 - 11:25:35 EST


On Thu 14-06-12 04:13:12, kosaki.motohiro@xxxxxxxxx wrote:
> From: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>
>
> Currently, do_try_to_free_pages() can enter livelock. Because of,
> now vmscan has two conflicted policies.
>
> 1) kswapd sleep when it couldn't reclaim any page when reaching
> priority 0. This is because to avoid kswapd() infinite
> loop. That said, kswapd assume direct reclaim makes enough
> free pages to use either regular page reclaim or oom-killer.
> This logic makes kswapd -> direct-reclaim dependency.
> 2) direct reclaim continue to reclaim without oom-killer until
> kswapd turn on zone->all_unreclaimble. This is because
> to avoid too early oom-kill.
> This logic makes direct-reclaim -> kswapd dependency.
>
> In worst case, direct-reclaim may continue to page reclaim forever
> when kswapd sleeps forever.
>
> We can't turn on zone->all_unreclaimable from direct reclaim path
> because direct reclaim path don't take any lock and this way is racy.
>
> Thus this patch removes zone->all_unreclaimable field completely and
> recalculates zone reclaimable state every time.
>
> Note: we can't take the idea that direct-reclaim see zone->pages_scanned
> directly and kswapd continue to use zone->all_unreclaimable. Because, it
> is racy. commit 929bea7c71 (vmscan: all_unreclaimable() use
> zone->all_unreclaimable as a name) describes the detail.
>
> Reported-by: Aaditya Kumar <aaditya.kumar.30@xxxxxxxxx>
> Reported-by: Ying Han <yinghan@xxxxxxxxxx>
> Cc: Nick Piggin <npiggin@xxxxxxxxx>
> Acked-by: Rik van Riel <riel@xxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxx>
> Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
> Cc: Mel Gorman <mel@xxxxxxxxx>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> Cc: Minchan Kim <minchan.kim@xxxxxxxxx>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx>

Looks good, just one comment bellow:

Reviewed-by: Michal Hocko <mhocko@xxxxxxx>

[...]
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index eeb3bc9..033671c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
[...]
> @@ -1936,8 +1936,8 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
> if (global_reclaim(sc)) {
> if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
> continue;
> - if (zone->all_unreclaimable &&
> - sc->priority != DEF_PRIORITY)
> + if (!zone_reclaimable(zone) &&
> + sc->priority != DEF_PRIORITY)

Not exactly a hot path but still would be nice to test the priority
first as the test is cheaper (maybe compiler is clever enough to reorder
this, as both expressions are independent and without any side-effects
but...).

[...]
> @@ -2393,8 +2388,7 @@ loop_again:
> if (!populated_zone(zone))
> continue;
>
> - if (zone->all_unreclaimable &&
> - sc.priority != DEF_PRIORITY)
> + if (!zone_reclaimable(zone) && sc.priority != DEF_PRIORITY)
> continue;

Same here

>
> /*
> @@ -2443,14 +2437,13 @@ loop_again:
> */
> for (i = 0; i <= end_zone; i++) {
> struct zone *zone = pgdat->node_zones + i;
> - int nr_slab, testorder;
> + int testorder;
> unsigned long balance_gap;
>
> if (!populated_zone(zone))
> continue;
>
> - if (zone->all_unreclaimable &&
> - sc.priority != DEF_PRIORITY)
> + if (!zone_reclaimable(zone) && sc.priority != DEF_PRIORITY)
> continue;

Same here

[...]
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/