Re: 2.6.28-rc4 mem_cgroup_charge_common panic

From: KAMEZAWA Hiroyuki
Date: Mon Nov 17 2008 - 20:09:21 EST


On Mon, 17 Nov 2008 13:30:08 -0800
Badari Pulavarty <pbadari@xxxxxxxxxx> wrote:

> On Fri, 2008-11-14 at 13:10 +0900, KAMEZAWA Hiroyuki wrote:
> > On Thu, 13 Nov 2008 10:53:24 -0800
> > Badari Pulavarty <pbadari@xxxxxxxxxx> wrote:
> > > I tried mmtom + startpfn fix + this fix + notifier fix. Didn't help.
> > > I am not using SLUB (using SLAB). Yes. I am testing "real" memory
> > > remove (not just offline/online), since it executes more code of
> > > freeing memmap etc.
> > >
> > > Code that is panicing is list_add() in mem_cgroup_add_list().
> > > I will debug it further.
> > >
> >
> > Considering difference between "real" memory hotplug and logical ones,
> > I found this. I hope this fixes the bug.
> > But I myself can't do test this..
> >
> > Thanks,
> > -Kame
> >
>
> Kame,
>
> With this patch I am able to run tests without any issues.
>
> Sorry for delayed response, I wanted to make sure test runs fine over
> the weekend.
>
> Tested-by: Badari Pulavarty <pbadari@xxxxxxxxxx>
>
Wow, Thank you!

-Kame


> Thanks,
> Badari
>
>
> > ==
> > Fixes for memcg/memory hotplug.
> >
> >
> > While memory hotplug allocate/free memmap, page_cgroup doesn't free
> > page_cgroup at OFFLINE when page_cgroup is allocated via bootomem.
> > (Because freeing bootmem requires special care.)
> >
> > Then, if page_cgroup is allocated by bootmem and memmap is freed/allocated
> > by memory hotplug, page_cgroup->page == page is no longer true and
> > we have to update that.
> >
> > But current MEM_ONLINE handler doesn't check it and update page_cgroup->page
> > if it's not necessary to allocate page_cgroup.
> >
> > And I noticed that MEM_ONLINE can be called against "part of section".
> > So, freeing page_cgroup at CANCEL_ONLINE will cause trouble.
> > (freeing used page_cgroup)
> > Don't rollback at CANCEL.
> >
> > One more, current memory hotplug notifier is stopped by slub
> > because it sets NOTIFY_STOP_MASK to return vaule. So, page_cgroup's callback
> > never be called. (low priority than slub now.)
> >
> > I think this slub's behavior is not intentional(BUG). and fixes it.
> >
> >
> > Another way to be considered about page_cgroup allocation:
> > - free page_cgroup at OFFLINE even if it's from bootmem
> > and remove specieal handler. But it requires more changes.
> >
> >
> > Signed-off-by: KAMEZAWA Hiruyoki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> >
> > ---
> > mm/page_cgroup.c | 39 +++++++++++++++++++++++++++------------
> > mm/slub.c | 6 ++++--
> > 2 files changed, 31 insertions(+), 14 deletions(-)
> >
> > Index: mmotm-2.6.28-Nov10/mm/page_cgroup.c
> > ===================================================================
> > --- mmotm-2.6.28-Nov10.orig/mm/page_cgroup.c
> > +++ mmotm-2.6.28-Nov10/mm/page_cgroup.c
> > @@ -104,18 +104,30 @@ int __meminit init_section_page_cgroup(u
> > unsigned long table_size;
> > int nid, index;
> >
> > - if (section->page_cgroup)
> > - return 0;
> > + if (!section->page_cgroup) {
> >
> > - nid = page_to_nid(pfn_to_page(pfn));
> > - table_size = sizeof(struct page_cgroup) * PAGES_PER_SECTION;
> > - if (slab_is_available()) {
> > - base = kmalloc_node(table_size, GFP_KERNEL, nid);
> > - if (!base)
> > - base = vmalloc_node(table_size, nid);
> > - } else {
> > - base = __alloc_bootmem_node_nopanic(NODE_DATA(nid), table_size,
> > + nid = page_to_nid(pfn_to_page(pfn));
> > + table_size = sizeof(struct page_cgroup) * PAGES_PER_SECTION;
> > + if (slab_is_available()) {
> > + base = kmalloc_node(table_size, GFP_KERNEL, nid);
> > + if (!base)
> > + base = vmalloc_node(table_size, nid);
> > + } else {
> > + base = __alloc_bootmem_node_nopanic(NODE_DATA(nid),
> > + table_size,
> > PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
> > + }
> > + } else {
> > + /*
> > + * We don't have to allocate page_cgroup again, but
> > + * address of memmap may be changed. So, we have to initialize
> > + * again.
> > + */
> > + base = section->page_cgroup + pfn;
> > + table_size = 0;
> > + /* check address of memmap is changed or not. */
> > + if (base->page == pfn_to_page(pfn))
> > + return 0;
> > }
> >
> > if (!base) {
> > @@ -204,19 +216,22 @@ static int page_cgroup_callback(struct n
> > ret = online_page_cgroup(mn->start_pfn,
> > mn->nr_pages, mn->status_change_nid);
> > break;
> > - case MEM_CANCEL_ONLINE:
> > case MEM_OFFLINE:
> > offline_page_cgroup(mn->start_pfn,
> > mn->nr_pages, mn->status_change_nid);
> > break;
> > case MEM_GOING_OFFLINE:
> > + case MEM_CANCEL_ONLINE:
> > break;
> > case MEM_ONLINE:
> > case MEM_CANCEL_OFFLINE:
> > break;
> > }
> >
> > - ret = notifier_from_errno(ret);
> > + if (ret)
> > + ret = notifier_from_errno(ret);
> > + else
> > + ret = NOTIFY_OK;
> >
> > return ret;
> > }
> > Index: mmotm-2.6.28-Nov10/mm/slub.c
> > ===================================================================
> > --- mmotm-2.6.28-Nov10.orig/mm/slub.c
> > +++ mmotm-2.6.28-Nov10/mm/slub.c
> > @@ -3220,8 +3220,10 @@ static int slab_memory_callback(struct n
> > case MEM_CANCEL_OFFLINE:
> > break;
> > }
> > -
> > - ret = notifier_from_errno(ret);
> > + if (ret)
> > + ret = notifier_from_errno(ret);
> > + else
> > + ret = NOTIFY_OK;
> > return ret;
> > }
> >
> >
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/