Re: Oopses with both recent 2.4.x kernels and 2.6.x kernels

From: Marcelo Tosatti
Date: Thu Feb 05 2004 - 18:56:00 EST




On Tue, 3 Feb 2004, Stian Jordet wrote:

> Hello,
>
> I have a server which was running 2.4.18 and 2.4.19 for almost 200 days
> each, without problems. After an upgrade to 2.4.22, the box haven't been
> up for 30 days in a row. This happened early november. I have caputered
> oopses with both 2.4.23 and 2.6.1 which I have sent decoded to the list,
> but have never got any reply.
>
> I have ran memtest86 on the box, no errors. What else can be the
> problem? I could of course go back to 2.4.19, which I know worked fine,
> but I there have been some fixed security holes since then...
>
> Any thoughts?

Stian,

I have seen your 2.4.x oopses and they seemed odd. The faults were
happening in different functions (mostly inside VM "freeing" , due to
what seems to be random crap in memory:

<1>Unable to handle kernel NULL pointer dereference at virtual address 00000021
c0132e86
*pde = 00000000

eax: 00000000 ebx: 00000009 ecx: 000001d2 edx: 00000012
esi: 00000000 edi: c17e38c0 ebp: c1047a00 esp: c86cbdb4

>>EIP; c0132e86 <sync_page_buffers+e/a4> <=====

>>edi; c17e38c0 <_end+14b5844/bd23f84>
>>ebp; c1047a00 <_end+d19984/bd23f84>
>>esp; c86cbdb4 <_end+839dd38/bd23f84>

Trace; c0132fdc <try_to_free_buffers+c0/ec>

Code; c0132e86 <sync_page_buffers+e/a4>
00000000 <_EIP>:
Code; c0132e86 <sync_page_buffers+e/a4> <=====
0: f6 43 18 06 testb $0x6,0x18(%ebx) <=====
Code; c0132e8a <sync_page_buffers+12/a4>
4: 74 7c je 82 <_EIP+0x82> c0132f08
<sync_page_buffers+90/a4>
Code; c0132e8c <sync_page_buffers+14/a4>
6: b8 07 00 00 00 mov $0x7,%eax
Code; c0132e91 <sync_page_buffers+19/a4>




<1>Unable to handle kernel NULL pointer dereference at virtual address
00000028
c015e3a2
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c015e3a2>] Not tainted
EFLAGS: 00010203

eax: 0100004d ebx: 00000000 ecx: 000001d2 edx: 00000000

Code; c015e3a2 <journal_try_to_free_buffers+5a/98>
00000000 <_EIP>:
Code; c015e3a2 <journal_try_to_free_buffers+5a/98> <=====
0: 8b 5b 28 mov 0x28(%ebx),%ebx <=====
Code; c015e3a5 <journal_try_to_free_buffers+5d/98>
3: f6 42 19 04 testb $0x4,0x19(%edx)
Code; c015e3a9 <journal_try_to_free_buffers+61/98>
7: 74 17 je 20 <_EIP+0x20> c015e3c2
<journal_try_to_free_buffers+7a/98>

And other similar oopses.

Are you sure there is nothing messing up the hardware ?

How long have you ran memtest86? It can, sometimes, take a long to showup
errors.

The 2.6.x oopses on the same hardware is also a useful source of
information.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/