Re: Deadlock possibly caused by too_many_isolated.

From: Neil Brown
Date: Wed Sep 15 2010 - 04:44:50 EST


On Wed, 15 Sep 2010 16:28:43 +0800
Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote:

> Neil,
>
> Sorry for the rushed and imaginary ideas this morning..
>
> > @@ -1101,6 +1101,12 @@ static unsigned long shrink_inactive_lis
> > int lumpy_reclaim = 0;
> >
> > while (unlikely(too_many_isolated(zone, file, sc))) {
> > + if ((sc->gfp_mask & GFP_IOFS) != GFP_IOFS)
> > + /* Not allowed to do IO, so mustn't wait
> > + * on processes that might try to
> > + */
> > + return SWAP_CLUSTER_MAX;
> > +
>
> The above patch should behavior like this: it returns SWAP_CLUSTER_MAX
> to cheat all the way up to believe "enough pages have been reclaimed".
> So __alloc_pages_direct_reclaim() see non-zero *did_some_progress and
> go on to call get_page_from_freelist(). That normally fails because
> the task didn't really scanned the LRU lists. However it does have the
> possibility to succeed -- when so many processes are doing concurrent
> direct reclaims, it may luckily get one free page reclaimed by other
> tasks. What's more, if it does fail to get a free page, the upper
> layer __alloc_pages_slowpath() will be repeat recalling
> __alloc_pages_direct_reclaim(). So, sooner or later it will succeed in
> "stealing" a free page reclaimed by other tasks.
>
> In summary, the patch behavior for !__GFP_IO/FS is
> - won't do any page reclaim
> - won't fail the page allocation (unexpected)
> - will wait and steal one free page from others (unreasonable)
>
> So it will address the problem you encountered, however it sounds
> pretty unexpected and illogical behavior, right?
>
> I believe this patch will address the problem equally well.
> What do you think?

Thank you for the detailed explanation. Is agree with your reasoning and
now understand why your patch is sufficient.

I will get it tested and let you know how that goes.

Thanks,
NeilBrown


>
> Thanks,
> Fengguang
> ---
>
> mm: Avoid possible deadlock caused by too_many_isolated()
>
> Neil finds that if too_many_isolated() returns true while performing
> direct reclaim we can end up waiting for other threads to complete their
> direct reclaim. If those threads are allowed to enter the FS or IO to
> free memory, but this thread is not, then it is possible that those
> threads will be waiting on this thread and so we get a circular
> deadlock.
>
> some task enters direct reclaim with GFP_KERNEL
> => too_many_isolated() false
> => vmscan and run into dirty pages
> => pageout()
> => take some FS lock
> => fs/block code does GFP_NOIO allocation
> => enter direct reclaim again
> => too_many_isolated() true
> => waiting for others to progress, however the other
> tasks may be circular waiting for the FS lock..
>
> The fix is to let !__GFP_IO and !__GFP_FS direct reclaims enjoy higher
> priority than normal ones, by honouring them higher throttle threshold.
>
> Now !__GFP_IO/FS reclaims won't be waiting for __GFP_IO/FS reclaims to
> progress. They will be blocked only when there are too many concurrent
> !__GFP_IO/FS reclaims, however that's very unlikely because the IO-less
> direct reclaims is able to progress much more faster, and they won't
> deadlock each other. The threshold is raised high enough for them, so
> that there can be sufficient parallel progress of !__GFP_IO/FS reclaims.
>
> Reported-by: NeilBrown <neilb@xxxxxxx>
> Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>
> ---
> mm/vmscan.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> --- linux-next.orig/mm/vmscan.c 2010-09-15 11:58:58.000000000 +0800
> +++ linux-next/mm/vmscan.c 2010-09-15 15:36:14.000000000 +0800
> @@ -1141,36 +1141,39 @@ int isolate_lru_page(struct page *page)
> return ret;
> }
>
> /*
> * Are there way too many processes in the direct reclaim path already?
> */
> static int too_many_isolated(struct zone *zone, int file,
> struct scan_control *sc)
> {
> unsigned long inactive, isolated;
> + int ratio;
>
> if (current_is_kswapd())
> return 0;
>
> if (!scanning_global_lru(sc))
> return 0;
>
> if (file) {
> inactive = zone_page_state(zone, NR_INACTIVE_FILE);
> isolated = zone_page_state(zone, NR_ISOLATED_FILE);
> } else {
> inactive = zone_page_state(zone, NR_INACTIVE_ANON);
> isolated = zone_page_state(zone, NR_ISOLATED_ANON);
> }
>
> - return isolated > inactive;
> + ratio = sc->gfp_mask & (__GFP_IO | __GFP_FS) ? 1 : 8;
> +
> + return isolated > inactive * ratio;
> }
>
> /*
> * TODO: Try merging with migrations version of putback_lru_pages
> */
> static noinline_for_stack void
> putback_lru_pages(struct zone *zone, struct scan_control *sc,
> unsigned long nr_anon, unsigned long nr_file,
> struct list_head *page_list)
> {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/