Re: v2.6.26-rc9: kernel BUG at kernel/sched.c:5858!

From: Vegard Nossum
Date: Fri Jul 11 2008 - 01:49:48 EST


On Thu, Jul 10, 2008 at 10:16 PM, Dmitry Adamushko
<dmitry.adamushko@xxxxxxxxx> wrote:
> Yeah, it's possible that a caller of kmem_cache_alloc() ->
> slab_alloc() can be migrated on another CPU right after
> local_irq_restore() and before memset(). The inital cpu can become
> offline in the mean time (or a migration is a consequence of the CPU
> going offline) so its 'kmem_cache_cpu' structure gets freed (
> slab_cpuup_callback).
>
> At some point of time the caller continues on another CPU having an
> obsolete pointer...
>
> does something like this help?

Nice :-)

By the way, this also explains the heavy corruption I was seeing (NULL
pointers in lists detected by list debugging, etc.); SLUB was doing a
HUGE memset of 0 on arbitrary memory, i.e. the memset effectively
became:

memset(object, 0, 0x1adadada);

..and in some of the cases, the machine didn't crash inside SLUB but
proceeded...

I guess I should reload and try the latest -git now :-)

Thanks!


Vegard

--
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/