2.0.36 oopses: clear_inode->grow_files->file_table_init-> die_if_kernel

Paul Wouters (paul@xtdnet.nl)
Fri, 12 Feb 1999 23:42:14 +0100 (MET)


On one machine we're getting more and more oopses. I'm suspecting failing
hardware now. Anyone can comment on it? It's a standard machine, stock
2.0.36 (oopses started while running 2.0.30, we then upgraded to 2.0.36),
3 network cards, and ide disk, pentium. Pretty standard machine.

None of the oopses were fatal (so far!) but some processes died (cron, sshd,
sendmails). Good luck this thing also had a restricted telnetd running :)

Can anyone confirm (or reject) that this is caused by a dying harddisk?
I'm very happy then despite this, the machine is still running. Good job
there people :)

Paul

general protection: 0000
CPU: 0
EIP: 0010:[<00123792>]
EFLAGS: 00010206
eax: fffffc00 ebx: 015e3894 ecx: 00326c0c edx: 0000000a
esi: 015e3810 edi: 00000000 ebp: 00000000 esp: 00457f94
ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Process sendmail (pid: 6195, process nr: 38, stackpage=00457000)
Stack: bfffe7cc 00000000 000001a4 001237e8 00326c0c 00000000 bfffe944 bfffe70c
01c52d00 0010a915 bfffe7cc 00000000 000001a4 00000000 bfffe944 bfffe70c
ffffffda 0000002b 0000002b 0000002b 0000002b 00000005 4007466c 00000023
Call Trace: [<001237e8>] [<0010a915>]
Code: 39 91 d8 01 00 00 7f 0a b8 e8 ff ff ff 5b 5e 5f c3 90 0f ab

/usr/local/sbin/ksymoops System.map < crash3
Using `System.map' to map addresses to symbols.

>>EIP: 123792 <clear_inode+22/110>
Trace: 1237e8 <clear_inode+78/110>
Trace: 10a915 <die_if_kernel+105/2e0>
Code: 123792 <clear_inode+22/110>
Code: 123792 <clear_inode+22/110> 39 91 d8 01 00 cmpl %edx,0x1d8(%ecx)
Code: 123798 <clear_inode+28/110> 7f 0a jg 1237a4 <clear_inode+34/110>
Code: 12379a <clear_inode+2a/110> b8 e8 ff ff ff movl $0xffffffe8,%eax
Code: 1237a5 <clear_inode+35/110> 5b popl %ebx
Code: 1237a6 <clear_inode+36/110> 5e popl %esi
Code: 1237a7 <clear_inode+37/110> 5f popl %edi
Code: 1237a8 <clear_inode+38/110> c3 ret
Code: 1237a9 <clear_inode+39/110> 90 nop
Code: 1237aa <clear_inode+3a/110> 0f ab 00 btsl %eax,(%eax)
Code: 1237b3 <clear_inode+43/110> 90 nop
Code: 1237b4 <clear_inode+44/110> 90 nop
Code: 1237b5 <clear_inode+45/110> 90 nop

This one happened about an hour later:

general protection: 0000
CPU: 0
EIP: 0010:[<00123792>]
EFLAGS: 00010206
eax: fffffc00 ebx: 015e3894 ecx: 008fcc0c edx: 0000000a
esi: 015e3810 edi: 00000000 ebp: 00000000 esp: 004b2f94
ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Process sendmail (pid: 6253, process nr: 38, stackpage=004b2000)
Stack: bfffe7cc 00000000 000001a4 001237e8 008fcc0c 00000000 bfffe944 bfffe70c
01c52d00 0010a915 bfffe7cc 00000000 000001a4 00000000 bfffe944 bfffe70c
ffffffda 0000002b 0000002b 0000002b 0000002b 00000005 4007466c 00000023
Call Trace: [<001237e8>] [<0010a915>]
Code: 39 91 d8 01 00 00 7f 0a b8 e8 ff ff ff 5b 5e 5f c3 90 0f ab

Giving:

>>EIP: 123792 <clear_inode+22/110>
Trace: 1237e8 <clear_inode+78/110>
Trace: 10a915 <die_if_kernel+105/2e0>
Code: 123792 <clear_inode+22/110>
Code: 123792 <clear_inode+22/110> 39 91 d8 01 00 cmpl %edx,0x1d8(%ecx)
Code: 123798 <clear_inode+28/110> 7f 0a jg 1237a4 <clear_inode+34/110>
Code: 12379a <clear_inode+2a/110> b8 e8 ff ff ff movl $0xffffffe8,%eax
Code: 1237a5 <clear_inode+35/110> 5b popl %ebx
Code: 1237a6 <clear_inode+36/110> 5e popl %esi
Code: 1237a7 <clear_inode+37/110> 5f popl %edi
Code: 1237a8 <clear_inode+38/110> c3 ret
Code: 1237a9 <clear_inode+39/110> 90 nop
Code: 1237aa <clear_inode+3a/110> 0f ab 00 btsl %eax,(%eax)
Code: 1237b3 <clear_inode+43/110> 90 nop
Code: 1237b4 <clear_inode+44/110> 90 nop
Code: 1237b5 <clear_inode+45/110> 90 nop

Shortly followed by another one:
CPU: 0
EIP: 0010:[<0012499c>]
EFLAGS: 00010282
eax: 018efc00 ebx: fffffffe ecx: 00000000 edx: 00000000
esi: 00000001 edi: 00008000 ebp: 01875f8c esp: 01875f3c
ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Process sshd (pid: 273, process nr: 12, stackpage=01875000)
Stack: 0012c2da 018efc00 019044c8 00000000 01268000 00000005 00000000 018efc00
01268005 00000006 00123689 01268000 00000001 00000000 01875f8c 00000000
00000000 00000005 00000000 00000000 00000000 0012380f 01268000 00000000
Call Trace: [<0012c2da>] [<00123689>] [<0012380f>] [<0010a915>]
Code: 53 8b 5c 24 08 85 db 0f 84 81 01 00 00 80 bb 88 00 00 00 00

Giving:

>>EIP: 12499c <grow_files+8c/90>
Trace: 12c2da <connect_select+a/1b0>
Trace: 123689 <insert_inode_hash+29/40>
Trace: 12380f <clear_inode+9f/110>
Trace: 10a915 <die_if_kernel+105/2e0>
Code: 12499c <grow_files+8c/90>
Code: 12499c <grow_files+8c/90> 53 pushl %ebx
Code: 12499d <grow_files+8d/90> 8b 5c 24 08 movl 0x8(%esp,1),%ebx
Code: 1249a1 <file_table_init+1/10> 85 db testl %ebx,%ebx
Code: 1249a3 <file_table_init+3/10> 0f 84 81 01 00 je 124b2a <__wait_on_buffer+8a/100>
Code: 1249af <file_table_init+f/10> 80 bb 88 00 00 cmpb $0x0,0x88(%ebx)

30 seconds later, things got pretty bad :)

Unable to handle kernel paging request at virtual address 01bc2ff0
current->tss.cr3 = 01bc1000,
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<0010a965>]
EFLAGS: 00010246
eax: 00000246 ebx: 01bc3018 ecx: bffffda0 edx: ffff0ff0
esi: 36c48571 edi: bffffd90 ebp: bffffd60 esp: 01bc2fbc
ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Process crond (pid: 48, process nr: 11, stackpage=01bc2000)
Stack: 00000000 bffffda0 bffffd90 36c48571 bffffd90 bffffd60 36c48576 0000002b
0000002b 0000002b 0000002b 0000000d 40075182 00000023 00000246 bffffd5c
0000002b
Call Trace:
Code: 66 83 7c 24 34 10 74 35 fb 0d 00 02 00 00 25 ff bf ff ff 89

Giving:

>>EIP: 10a965 <die_if_kernel+155/2e0>
Code: 10a965 <die_if_kernel+155/2e0>
Code: 10a965 <die_if_kernel+155/2e0> 66 83 7c 24 34 cmpw $0x10,0x34(%esp,1)
Code: 10a96b <die_if_kernel+15b/2e0> 74 35 je 10a9a2 <die_if_kernel+192/2e0>
Code: 10a96d <die_if_kernel+15d/2e0> fb sti
Code: 10a96e <die_if_kernel+15e/2e0> 0d 00 02 00 00 orl $0x200,%eax
Code: 10a979 <die_if_kernel+169/2e0> 25 ff bf ff ff andl $0xffffbfff,%eax
Code: 10a97e <die_if_kernel+16e/2e0> 89 00 movl %eax,(%eax)
Code: 10a986 <die_if_kernel+176/2e0> 90 nop
Code: 10a987 <die_if_kernel+177/2e0> 90 nop
Code: 10a988 <die_if_kernel+178/2e0> 90 nop

And the last one:

general protection: 0000
CPU: 0
EIP: 0010:[<00123799>]
EFLAGS: 00010216
eax: ffffff80 ebx: 01ab1894 ecx: 01911810 edx: 00000007
esi: 01ab1810 edi: bfffe7d8 ebp: 00000000 esp: 008fcf98
ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018
Process sendmail (pid: 6259, process nr: 11, stackpage=008fc000)
Stack: 080c96c8 000001a4 001237e8 01911810 080c96c8 bfffe724 bfffe6c4 008fcfbc
0010a915 bfffe7d8 00000000 000001a4 080c96c8 bfffe724 bfffe6c4 ffffffda
0000002b 0000002b 0000002b 0000002b 00000005 4007466c 00000023 00000202
Call Trace: [<001237e8>] [<0010a915>]
Code: 0a b8 e8 ff ff ff 5b 5e 5f c3 90 0f ab 96 84 00 00 00 0f b3

Giving another clear_inode problem:

>>EIP: 123799 <clear_inode+29/110>
Trace: 1237e8 <clear_inode+78/110>
Trace: 10a915 <die_if_kernel+105/2e0>
Code: 123799 <clear_inode+29/110>
Code: 123799 <clear_inode+29/110> 0a b8 e8 ff ff orb 0xffffffe8(%eax),%bh
Code: 12379f <clear_inode+2f/110> 5b popl %ebx
Code: 1237a0 <clear_inode+30/110> 5e popl %esi
Code: 1237a1 <clear_inode+31/110> 5f popl %edi
Code: 1237a2 <clear_inode+32/110> c3 ret
Code: 1237a9 <clear_inode+39/110> 90 nop
Code: 1237aa <clear_inode+3a/110> 0f ab 96 84 00 btsl %edx,0x84(%esi)
Code: 1237b1 <clear_inode+41/110> 0f b3 00 btrl %eax,(%eax)
Code: 1237ba <clear_inode+4a/110> 90 nop
Code: 1237bb <clear_inode+4b/110> 90 nop
Code: 1237bc <clear_inode+4c/110> 90 nop

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/