Re: v2.6.26-rc9: kernel BUG at kernel/sched.c:5858!

From: Vegard Nossum
Date: Thu Jul 10 2008 - 15:49:22 EST


Okay, some more info on this one...

On Thu, Jul 10, 2008 at 4:16 PM, Vegard Nossum <vegard.nossum@xxxxxxxxx> wrote:
> BUG: unable to handle kernel paging request at da87d000
> IP: [<c01991c7>] kmem_cache_alloc+0xc7/0xe0
> *pde = 28180163 *pte = 1a87d160
> Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> Pid: 3850, comm: grep Not tainted (2.6.26-rc9-00059-gb190333 #5)
> EIP: 0060:[<c01991c7>] EFLAGS: 00210203 CPU: 0
> EIP is at kmem_cache_alloc+0xc7/0xe0
> EAX: 00000000 EBX: da87c100 ECX: 1adad71a EDX: 6b6b6b6b
> ESI: 00200282 EDI: da87d000 EBP: f60bfe74 ESP: f60bfe54
> DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068

The register %ecx looks innocent but is very important here. The disassembly:

mov %edx,%ecx
shr $0x2,%ecx
rep stos %eax,%es:(%edi) <-- the fault

So %ecx has been loaded from %edx... which is 0x6b6b6b6b/POISON_FREE.
(0x6b6b6b6b >> 2 == 0x1adadada.)

%ecx is the counter for the memset, from here:

memset(object, 0, c->objsize);

i.e. %ecx was loaded from c->objsize, so "c" must have been freed.
Where did "c" come from? Uh-oh...

c = get_cpu_slab(s, smp_processor_id());

This looks like it has very much to do with CPU hotplug/unplug. Is
there a race between SLUB/hotplug since the CPU slab is used after it
has been freed?


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/