Re: [BUG] i386 2.6.18 cpu_up: attempt to bring up CPU 4 failed :kernel BUG at mm/slab.c:2698!

From: Andrew Morton
Date: Thu Sep 21 2006 - 23:16:57 EST


On Fri, 22 Sep 2006 11:24:27 +0900
KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:

> On Thu, 21 Sep 2006 18:34:03 -0700
> keith mannthey <kmannth@xxxxxxxxxx> wrote:
>
> > That unhappy caller in the chain is cpuup_callback in mm/slab.c. I am
> > still working out as to why, there is a lot going on if this function.
> >
> > > b) pageset_cpuup_callback()'s CPU_UP_CANCELED path possibly hasn't been
> > > tested before. I'd be guessing that we're not zeroing out the
> > > zone.pageset[] array when the `struct zone' is first allocated, but I
> > > don't immediately recall where that code lives.
> >
>
> How about here ?
> == at boot time in mm/page_alloc.c ==
> free_area_init_core()
> ->zone_pcp_init(zone);
> for (cpu = 0; cpu < NR_CPUS; cpu++) {
> #ifdef CONFIG_NUMA
> /* Early boot. Slab allocator not functional yet */
> zone_pcp(zone, cpu) = &boot_pageset[cpu];
> setup_pageset(&boot_pageset[cpu],0);
> #else
> setup_pageset(zone_pcp(zone,cpu), batch);
> #endif
> }
> ==================
>
> Not zero-cleared.
>

Actually, I'd point the finger at process_zones(). If that kmalloc()
fails, we leave garbage in the entries for the remaining zones. But
free_zone_pagesets() kfrees all of them, uncluding the garbage pointers.

So something like...

--- a/mm/page_alloc.c~a
+++ a/mm/page_alloc.c
@@ -1811,11 +1811,14 @@ static struct per_cpu_pageset boot_pages
*/
static int __cpuinit process_zones(int cpu)
{
- struct zone *zone, *dzone;
+ struct zone *zone;

- for_each_zone(zone) {
+ for_each_zone(zone)
+ zone_pcp(zone, cpu) = NULL;

- zone_pcp(zone, cpu) = kmalloc_node(sizeof(struct per_cpu_pageset),
+ for_each_zone(zone) {
+ zone_pcp(zone, cpu) =
+ kmalloc_node(sizeof(struct per_cpu_pageset),
GFP_KERNEL, cpu_to_node(cpu));
if (!zone_pcp(zone, cpu))
goto bad;
@@ -1824,17 +1827,16 @@ static int __cpuinit process_zones(int c

if (percpu_pagelist_fraction)
setup_pagelist_highmark(zone_pcp(zone, cpu),
- (zone->present_pages / percpu_pagelist_fraction));
+ (zone->present_pages / percpu_pagelist_fraction));
}

return 0;
bad:
- for_each_zone(dzone) {
- if (dzone == zone)
- break;
- kfree(zone_pcp(dzone, cpu));
- zone_pcp(dzone, cpu) = NULL;
+ for_each_zone(zone) {
+ kfree(zone_pcp(zone, cpu));
+ zone_pcp(zone, cpu) = NULL;
}
+ printk(KERN_EMERG "%s: kmalloc() failed\n", __FUNCTION__);
return -ENOMEM;
}

_


But why did the kmalloc() fail?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/