Re: lots of 2.2.4 oopses (NOT an egcs problem)

Bruce Harada (bruce@ask.ne.jp)
Mon, 29 Mar 1999 05:46:10 +0900

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: George Bonser: "Re: lots of 2.2.4 oopses (NOT an egcs problem)"
Previous message: Livia Catarina Soares: "Scheduling - More questions"

Linus Torvalds wrote:
> Note that it's likely that there is just a few places (or even just _one_)
> where Linux and egcs disagree about something. So it can be very setup-
> dependent, and it could also be a question of timing.
>
> For example, it could be a case where egcs actually does something that is
> correct ANSI C - re-orders stores to different memory locations because
> egcs did the alias analysis and determined that they cannot clash. And it
> may be that the kernel depends on the exact order of stores. That would be
> a kernel bug.
>
> Equally possible is that egcs just gets the alias calculations wrong, or
> that there is some other bug that just makes it generate bad code under
> certain circumstances. Finding out exactly what the problem is can be
> _very_ hard: the oops that was posted looked basically like memory
> corruption, so the bug was not actually likely to be anywhere close to
> where the crash actually happened.

Sorry, but it doesn't look like a egcs problem - I've been getting
similar oopses since I upgraded to 2.2.4-ac1 (although 2.2.3-ac4 and
2.2.1-ac2 also had some suspicious freezes, but they were while I was
away from the console - nothing in the logs, unfortunately).

The system is:
PentiumII @ 400MHz x 2 (SMP kernel)
Slackware 3.6 (libc5)
gcc version 2.7.2.3
ld version 2.8.2 (with BFD 2.8.1.0.23)

The kernel falls over about once an hour under medium use, and about
once a day if it's left alone. I haven't found any single cause, but at
one point the kernel did get itself stuck in a state where it would
produce oopses on demand (either ps or top would do it once in every
three tries or so; any attempt at make would stop with various parsing
errors or a sig11.)

I've included the ksymoops output from the first three oopses that I was
able to get. The entire series (about a dozen) is at
http://www.ask.ne.jp/~bruce/trace.out_all

--------------------------------------------------

<1>Unable to handle kernel NULL pointer dereference at virtual address
00000000
<1>current->tss.cr3 = 0c22d000, %cr3 = 0c22d000
<1>*pde = 00000000
<4>Oops: 0000
<4>CPU: 0
<4>EIP: 0010:[<c0133871>]
<4>EFLAGS: 00010286
<4>eax: 00000550 ebx: ffffffe8 ecx: d72b70aa edx: d72b7ddb
<4>esi: 07272fb5 edi: 00000001 ebp: 00000000 esp: c1185f04
<4>ds: 0018 es: 0018 ss: 0018
<4>Process ps (pid: 2455, process nr: 88, stackpage=c1185000)
<4>Stack: 00000001 cff78ac0 c0231d88 c0978006 07272fb5 00000006 c012ea14
cff78ac
0
<4> c1185f4c c1185f4c c012ec40 cff78ac0 c1185f4c c0978000 ffffffe9
0000000
1
<4> c0978000 c1b7c130 c0978006 00000006 07272fb5 c012edc5 c0978000
0000000
0
<4>Call Trace: [<c012ea14>] [<c012ec40>] [<c012edc5>] [<c01271d6>]
[<c0127457>]
[<c0108d74>]
<4>Code: 8b 6d 00 8b 74 24 18 39 73 48 75 eb 8b 74 24 24 39 73 0c 75

>>EIP: c0133871 <d_lookup+65/dc>
Trace: c012ea14 <cached_lookup+10/4c>
Trace: c012ec40 <lookup_dentry+fc/1b8>
Trace: c012edc5 <open_namei+6d/2f4>
Trace: c01271d6 <filp_open+46/f8>
Trace: c0127457 <sys_open+53/b4>
Trace: c0108d74 <system_call+34/40>
Code: c0133871 <d_lookup+65/dc> 00000000 <_EIP>: <===
Code: c0133871 <d_lookup+65/dc> 0: 8b 6d 00
movl 0
x0(%ebp),%ebp <===
Code: c0133874 <d_lookup+68/dc> 3: 8b 74 24 18
movl 0
x18(%esp,1),%esi
Code: c0133878 <d_lookup+6c/dc> 7: 39 73 48
cmpl %
esi,0x48(%ebx)
Code: c013387b <d_lookup+6f/dc> a: 75 eb
jne
c0133868 <d_lookup+5c/dc>
Code: c013387d <d_lookup+71/dc> c: 8b 74 24 24
movl 0
x24(%esp,1),%esi
Code: c0133881 <d_lookup+75/dc> 10: 39 73 0c
cmpl %
esi,0xc(%ebx)
Code: c0133884 <d_lookup+78/dc> 13: 75 00
jne
c0133886 <d_lookup+7a/dc>

733 warnings issued. Results may not be reliable.

--------------------------------------------------

<1>Unable to handle kernel NULL pointer dereference at virtual address
00000000
<1>current->tss.cr3 = 04dba000, %cr3 = 04dba000
<1>*pde = 00000000
<4>Oops: 0000
<4>CPU: 1
<4>EIP: 0010:[<c0133871>]
<4>EFLAGS: 00010286
<4>eax: 00000550 ebx: ffffffe8 ecx: d72b70aa edx: d72b7ddb
<4>esi: 07272fb5 edi: 00000001 ebp: 00000000 esp: c1185f04
<4>ds: 0018 es: 0018 ss: 0018
<4>Process ps (pid: 2464, process nr: 88, stackpage=c1185000)
<4>Stack: 00000001 cff78ac0 c0231d88 c0e5d006 07272fb5 00000006 c012ea14
cff78ac
0
<4> c1185f4c c1185f4c c012ec40 cff78ac0 c1185f4c c0e5d000 ffffffe9
0000000
1
<4> c0e5d000 c0b89130 c0e5d006 00000006 07272fb5 c012edc5 c0e5d000
0000000
0
<4>Call Trace: [<c012ea14>] [<c012ec40>] [<c012edc5>] [<c01271d6>]
[<c0127457>]
[<c0108d74>]
<4>Code: 8b 6d 00 8b 74 24 18 39 73 48 75 eb 8b 74 24 24 39 73 0c 75

733 warnings issued. Results may not be reliable.

--------------------------------------------------

<1>Unable to handle kernel paging request at virtual address ef70c7a0
<1>current->tss.cr3 = 0c22d000, %cr3 = 0c22d000
<1>*pde = 00000000
<4>Oops: 0000
<4>CPU: 1
<4>EIP: 0010:[<c012ead6>]
<4>EFLAGS: 00010286
<4>eax: c6dcef66 ebx: c6dcef66 ecx: 00000001 edx: ef70c73c
<4>esi: c6dcef66 edi: cd5c2cc0 ebp: cd5c2cc0 esp: c0b2df24
<4>ds: 0018 es: 0018 ss: 0018
<4>Process cpp (pid: 2499, process nr: 86, stackpage=c0b2d000)
<4>Stack: 00000001 c012ec88 cd5c2cc0 c6dcef66 00000001 c123d000 ffffffe9
0000000
1
<4> c123d000 c946c138 c123d000 0000000a 722087b0 c012edc5 c123d000
0000000
0
<4> 00000001 cf710720 ffffffe9 000001b6 c123d000 bfffee98 000081a4
c01271d
6
<4>Call Trace: [<c012ec88>] [<c012edc5>] [<c01271d6>] [<c0127457>]
[<c0108d74>]
<4>Code: 8b 42 64 85 c0 74 55 83 78 2c 00 74 4f bb 00 e0 ff ff 21 e3

>>EIP: c012ead6 <do_follow_link+16/84>
Trace: c012ec88 <lookup_dentry+144/1b8>
Trace: c012edc5 <open_namei+6d/2f4>
Trace: c01271d6 <filp_open+46/f8>
Trace: c0127457 <sys_open+53/b4>
Trace: c0108d74 <system_call+34/40>
Code: c012ead6 <do_follow_link+16/84> 00000000 <_EIP>: <===
Code: c012ead6 <do_follow_link+16/84> 0: 8b 42 64
movl 0
x64(%edx),%eax <===
Code: c012ead9 <do_follow_link+19/84> 3: 85 c0
testl %
eax,%eax
Code: c012eadb <do_follow_link+1b/84> 5: 74 55
je
c012eb32 <do_follow_link+72/84>
Code: c012eadd <do_follow_link+1d/84> 7: 83 78 2c 00
cmpl $
0x0,0x2c(%eax)
Code: c012eae1 <do_follow_link+21/84> b: 74 4f
je
c012eb32 <do_follow_link+72/84>
Code: c012eae3 <do_follow_link+23/84> d: bb 00 e0 ff ff
movl $
0xffffe000,%ebx
Code: c012eae8 <do_follow_link+28/84> 12: 21 e3
andl %
esp,%ebx

732 warnings issued. Results may not be reliable.

--------------------------------------------------

Bruce Harada
bruce@ask.ne.jp

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: George Bonser: "Re: lots of 2.2.4 oopses (NOT an egcs problem)"
Previous message: Livia Catarina Soares: "Scheduling - More questions"