Re: 2.4.22-pre lockups (now decoded oops for pre10)

From: Andrea Arcangeli
Date: Mon Aug 18 2003 - 12:37:22 EST


On Wed, Aug 06, 2003 at 11:09:20AM +0200, Willy Tarreau wrote:
> On Wed, Aug 06, 2003 at 09:41:50AM +0200, Stephan von Krawczynski wrote:
>
> > Code; c0144b14 <__remove_from_queues+14/30>
> > 00000000 <_EIP>:
> > Code; c0144b14 <__remove_from_queues+14/30> <=====
> > 0: 89 02 mov %eax,(%edx) <=====
> > Code; c0144b16 <__remove_from_queues+16/30>
> > 2: c7 41 30 00 00 00 00 movl $0x0,0x30(%ecx)
> > Code; c0144b1d <__remove_from_queues+1d/30>
> > 9: 89 4c 24 04 mov %ecx,0x4(%esp,1)
> > Code; c0144b21 <__remove_from_queues+21/30>
> > d: e9 7a ff ff ff jmp ffffff8c <_EIP+0xffffff8c>
> > Code; c0144b26 <__remove_from_queues+26/30>
> > 12: 8d 76 00 lea 0x0(%esi),%esi
>
> once again, it's *pprev=next which is is causing trouble, with pprev=6 this
> time (fs/buffer.c:523). There really seems to be something playing badly with
> this...
>
> I find amazing that such widely used portions of code only trigger panics on
> your system ! either it's a rare combinations of several components/drivers, or
> a strange hardware problem, although I can't imagine which (cpu? bus locking?).

normally it's bad ram (or anyways a problem with the memory) when bugs
triggers in that place reproducibly. the list walking trashes the l2 and
that put more stress on the ram. If it was random memory corruption
(software) it would more likely crash in different places (though it's
not guaranteed ;).

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/