Re: signal_cache slab corruption.

From: Hugh Dickins
Date: Thu Mar 16 2006 - 15:25:48 EST


On Tue, 14 Mar 2006, Dave Jones wrote:
> On Tue, Mar 14, 2006 at 09:22:35AM +0000, Hugh Dickins wrote:
> > On Mon, 13 Mar 2006, Dave Jones wrote:
> >
> > > I got into the office today to find my workstation that was running
> > > a kernel based on .16rc5-git9 was totally unresponsive.
> > > After rebooting, I found this in the logs.
> > >
> > > slab signal_cache: invalid slab found in partial list at ffff8100e3a48080 (11/11).
> > > slab signal_cache: invalid slab found in partial list at ffff81007ecc6100 (11/11).
> > > slab: Internal list corruption detected in cache 'signal_cache'(11), slabp ffff810037ec0998(12). Hexdump:
> > >
> > > 000: c0 60 d9 7e 00 81 ff ff 00 61 cc 7e 00 81 ff ff
> > > 010: a8 09 ec 37 00 81 ff ff a8 09 ec 37 00 81 ff ff
> > > 020: 0c 00 00 00 00 00 00 00 57 d0 1d 07 01 00 00 00
> > > 030: 00 00 00 00
> > > ----------- [cut here ] --------- [please bite here ] ---------
> > > Kernel BUG at mm/slab.c:2598

Thanks for the diff, that fitted together better.

I spent several hours poring over your report,
but came to no useful conclusions - sorry.

As you probably noticed yourself, struct slab's colouroff and s_mem
have been overwritten by a list_head; but I can't deduce anything
illuminating from that.

Less obvious and more interesting, is that the struct kmem_cache is
being corrupted during the course of the output: it starts off saying
in several places that cachep->num is 11, but ends up printing only
one kmem_bufctl_t (aside from "free"): implying cachep->num 1 by then.
Which may relate to how 11/11 was wrongly in the partial list.

I'd be interested in the signal_cache line from your /proc/slabinfo,
to see what cachep->num is usually in your configuration. But it's
an idle interest: I won't have anything interesting to say, whatever
it is...

Anyone else?

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/