Re: [BUG] kernel BUG at mm/memcontrol.c:1074!

From: Hugh Dickins
Date: Thu Jan 19 2012 - 00:16:54 EST


On Thu, 19 Jan 2012, KAMEZAWA Hiroyuki wrote:
> On Wed, 18 Jan 2012 19:41:44 -0800 (PST)
> Hugh Dickins <hughd@xxxxxxxxxx> wrote:
> >
> > I notice that, unlike Linus's git, this linux-next still has
> > mm-isolate-pages-for-immediate-reclaim-on-their-own-lru.patch in.
> >
> > I think that was well capable of oopsing in mem_cgroup_lru_del_list(),
> > since it didn't always know which lru a page belongs to.
> >
> > I'm going to be optimistic and assume that was the cause.
> >
> Hmm, because the log hits !memcg at lru "del", the page should be added
> to LRU somewhere and the lru must be determined by pc->mem_cgroup.
>
> Once set, pc->mem_cgroup is not cleared, just overwritten. AFAIK, there is
> only one chance to set pc->mem_cgroup as NULL... initalization.
> I wonder why it hits lru_del() rather than lru_add()...
> ................
>
> Ahhhh, ok, it seems you are right. the patch has following kinds of codes
> ==
> +static void pagevec_putback_immediate_fn(struct page *page, void *arg)
> +{
> + struct zone *zone = page_zone(page);
> +
> + if (PageLRU(page)) {
> + enum lru_list lru = page_lru(page);
> + list_move(&page->lru, &zone->lru[lru].list);
> + }
> +}
> ==
> ..this will bypass mem_cgroup_lru_add(), and we can see bug in lru_del()
> rather than lru_add()..

I've not thought it through in detail (and your questioning reminds me
that the worst I saw from that patch was updating of the wrong counts,
leading to underflow, then livelock from the mismatch between empty list
and enormous count: I never saw an oops from it, and may be mistaken).

>
> Another question is who pushes pages to LRU before setting pc->mem_cgroup..
> Anyway, I think we need to fix memcg to be LRU_IMMEDIATE aware.

I don't think so: Mel agreed that the patch could not go forward as is,
without an additional pageflag, and asked Andrew to drop it from mmotm
in mail on 29th December (I didn't notice an mm-commits message to say
akpm did drop it, and marc is blacked out in protest for today, so I
cannot check: but certainly akpm left it out of his push to Linus).

Oh, and Mel noticed another bug in it on the 30th, that the PageLRU
check in the function you quote above is wrong: see PATCH 11/11 thread.

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/