Re: Linux 3.3-rc4

From: Jiri Kosina
Date: Fri Feb 24 2012 - 13:01:15 EST


On Fri, 24 Feb 2012, Linus Torvalds wrote:

> > The machine has gone through several suspend-resume cycles before this
> > happened, so it might well also be some memory corruption caused by a
> > random driver.
>
> I almost think it is, because "file->dentry" should never be NULL in a
> mapping afaik. Especially as your "mapping" certainly isn't NULL (it's
> in %r12, so you can see it in your register dump).
>
> This isn't some unusual code sequence either, so I don't see it as
> some random latent bug that is just very unlikely and hard to trigger
> in that code itself.
>
> I'll think about it, but my first reaction is "memory corruption". Do
> you think you could try to run with a kernel that has SLAB debugging
> and poisoning on? If it's a stale pointer dereference that has cleared
> that dentry, that _might_ show it closer to the actual bug (rather
> than a long time later when the NULL dereference happens).

Running DEBUG_SLAB kernel since I have first hit the bug, but nothing
popped up yet. Seems undebuggable so far.

On the other hand I wouldn't blame HW for a bit-flip, as it was a clear
NULL pointer (plus 0x30 offset), not a random garbage.

--
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/