Re: Deadlock possibly caused by too_many_isolated.

From: KOSAKI Motohiro
Date: Mon Oct 18 2010 - 01:05:16 EST


> On Wed, 15 Sep 2010 18:44:34 +1000
> Neil Brown <neilb@xxxxxxx> wrote:
>
> > On Wed, 15 Sep 2010 16:28:43 +0800
> > Wu Fengguang <fengguang.wu@xxxxxxxxx> wrote:
> >
> > > Neil,
> > >
> > > Sorry for the rushed and imaginary ideas this morning..
> > >
> > > > @@ -1101,6 +1101,12 @@ static unsigned long shrink_inactive_lis
> > > > int lumpy_reclaim = 0;
> > > >
> > > > while (unlikely(too_many_isolated(zone, file, sc))) {
> > > > + if ((sc->gfp_mask & GFP_IOFS) != GFP_IOFS)
> > > > + /* Not allowed to do IO, so mustn't wait
> > > > + * on processes that might try to
> > > > + */
> > > > + return SWAP_CLUSTER_MAX;
> > > > +
> > >
> > > The above patch should behavior like this: it returns SWAP_CLUSTER_MAX
> > > to cheat all the way up to believe "enough pages have been reclaimed".
> > > So __alloc_pages_direct_reclaim() see non-zero *did_some_progress and
> > > go on to call get_page_from_freelist(). That normally fails because
> > > the task didn't really scanned the LRU lists. However it does have the
> > > possibility to succeed -- when so many processes are doing concurrent
> > > direct reclaims, it may luckily get one free page reclaimed by other
> > > tasks. What's more, if it does fail to get a free page, the upper
> > > layer __alloc_pages_slowpath() will be repeat recalling
> > > __alloc_pages_direct_reclaim(). So, sooner or later it will succeed in
> > > "stealing" a free page reclaimed by other tasks.
> > >
> > > In summary, the patch behavior for !__GFP_IO/FS is
> > > - won't do any page reclaim
> > > - won't fail the page allocation (unexpected)
> > > - will wait and steal one free page from others (unreasonable)
> > >
> > > So it will address the problem you encountered, however it sounds
> > > pretty unexpected and illogical behavior, right?
> > >
> > > I believe this patch will address the problem equally well.
> > > What do you think?
> >
> > Thank you for the detailed explanation. Is agree with your reasoning and
> > now understand why your patch is sufficient.
> >
> > I will get it tested and let you know how that goes.
>
> (sorry this has taken a month to follow up).
>
> Testing shows that this patch seems to work.
> The test load (essentially kernbench) doesn't deadlock any more, though it
> does get bogged down thrashing in swap so it doesn't make a lot more
> progress :-) I guess that is to be expected.
>
> One observation is that the kernbench generated 10%-20% more context switches
> with the patch than without. Is that to be expected?
>
> Do you have plans for sending this patch upstream?

Wow, I had thought this patch has been merged already. Wu, can you please
repost this one? and please add my and Neil's ack tag.

Thanks.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/