kernel panics at google

From: David desJardins (desj@google.com)
Date: Thu Jan 20 2000 - 01:12:33 EST


Google has been having a lot of trouble recently with our webservers
crashing. We're certain that this particular problem isn't due to some
sort of bad hardware (e.g., memory errors): it happens on all machines
indiscriminately when we try to use them as webservers. It happens a
lot, but since the machines crash, and they don't have consoles, it's
hard for us to save the kernel messages. But we have a few.

If I'm interpreting the kernel oops correctly, the error occurs in
tcp_retrans_try_collapse. I've attached below the oops, after
attempting to run it through ksymoops. (I think I got the right
symbols, but I'm not absolutely certain.)

These systems are running 2.2.13 with some patches.

We could really use informed suggestions about what could be triggering
this, or how we might try to work around it. Thanks very much in
advance for any help.

  -- David desJardins

----------------------------------------
Unable to handle kernel NULL pointer dereference at virtual address 00000066
current->tss.cr3 = 0010100, %cr3 = 00101000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c0158892>]
EFLAGS: 00010246
eax: 00000000 ebx: cc239460 ecx: 00000000 edx: 00000000
esi: caf116f0 edi: cc239460 ebp: c01c3de0 esp: c01c3dcc
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, process nr: 0, stackpage=c01c3000)
Stack: 000005b4 00000000 c0b61ae0 c01462cb c0b61ae0 caf11640 c0158be7 caf11640
       cc239460 000005b4 caf11640 caf116f0 caf11694 caf11640 c0158d0d caf11640
       cc239460 caf116f0 caadd001 0000000e c01560f5 caf11640 caf116f0 caf11640
Call Trace: [<c01462cb>] [<c0158be7>] [<c0158d0d>] [<c01560f5>] [<c0157190>] [<c015bf1f>] [<c015c2c2>]
       [<c014f8ca>] [<c014fb26>] [<c01480a1>] [<c0117795>] [<c010a127>] [<c0109d88>] [<c010780d>] [<c0106000>]
       [<c0107830>] [<c0108f48>] [<c0106000>] [<c010607b>] [<c0106000>] [<c0100176>]
Code: 80 79 66 00 74 11 8b 81 88 00 00 00 83 38 01 0f 95 c0 25 ff

>>EIP: c0158892 <tcp_retrans_try_collapse+3a/208>
Trace: c01462cb <__kfree_skb+97/a0>
Trace: c0158be7 <tcp_retransmit_skb+a3/164>
Trace: c0158d0d <tcp_xmit_retransmit_queue+65/e4>
Trace: c01560f5 <tcp_ack+289/370>
Trace: c0157190 <tcp_rcv_established+180/5e8>
Trace: c015bf1f <tcp_v4_do_rcv+6f/178>
Trace: c015c2c2 <tcp_v4_rcv+29a/320>
Trace: c014f8ca <ip_local_deliver+16a/1b8>
Trace: c0107830 <sys_idle+14/24>
Code: c0158892 <tcp_retrans_try_collapse+3a/208> 00000000 <_EIP>: <===
Code: c0158892 <tcp_retrans_try_collapse+3a/208> 0: 80 79 66 00 cmpb $0x0,0x66(%ecx) <===
Code: c0158896 <tcp_retrans_try_collapse+3e/208> 4: 74 11 je c01588a9 <tcp_retrans_try_collapse+51/208>
Code: c0158898 <tcp_retrans_try_collapse+40/208> 6: 8b 81 88 00 00 00 movl 0x88(%ecx),%eax
Code: c015889e <tcp_retrans_try_collapse+46/208> c: 83 38 01 cmpl $0x1,(%eax)
Code: c01588a1 <tcp_retrans_try_collapse+49/208> f: 0f 95 c0 setne %al
Code: c01588a4 <tcp_retrans_try_collapse+4c/208> 12: 25 ff 00 00 00 andl $0xff,%eax

Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!
In swapper task - not syncing

----------------------------------------
Unable to handle kernel NULL pointer dereference at virtual address 00000066
current->tss.cr3 = 0010100, %cr3 = 00101000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c0158892>]
EFLAGS: 00010246
eax: 00000000 ebx: cea7a1e0 ecx: 00000000 edx: 00000000
esi: c40cabf0 edi: cea7a1e0 ebp: c01c3deo esp: c01c3dcc
ds: 0018 es: 0018 ss: 0018
Process swapper (pid: 0, process nr: 0, stackpage=c01c3000)
Stack: 00000058 00000000 c716f660 c01462cb c716f660 c40cab40 c0158be7 c40cab40
       cea7a1e0 00000058 c40cab40 c40cabf0 c40cab94 c40cab40 c0158d0d c40cab40
       cea7a1eo c40cabf0 c7089d01 0000000e c01560f5 c40cab40 c40cabf0 c40cab40
Call Trace: [<c01462cb>] [<c0158be7>] [<c0158d0d>] [<c01560f5>] [<c0146226>] [<c0157190>] [<c015bf1f>]
       [<c015c2c2>] [<c014f8ca>] [<c014fb26>] [<c01480a1>] [<c0117795>] [<c010a127>] [<c0109d88>] [<c010780d>]
       [<c0106000>] [<c0107830>] [<c0108f48>] [<c0106000>] [<c010607b>] [<c0106000>] [<c0100176>]
Code: 80 79 66 00 74 11 8b 81 88 00 00 00 83 38 01 0f 95 c0 25 ff

>>EIP: c0158892 <tcp_retrans_try_collapse+3a/208>
Trace: c01462cb <__kfree_skb+97/a0>
Trace: c0158be7 <tcp_retransmit_skb+a3/164>
Trace: c0158d0d <tcp_xmit_retransmit_queue+65/e4>
Trace: c01560f5 <tcp_ack+289/370>
Trace: c0146226 <kfree_skbmem+32/40>
Trace: c0157190 <tcp_rcv_established+180/5e8>
Trace: c015bf1f <tcp_v4_do_rcv+6f/178>
Trace: c015c2c2 <tcp_v4_rcv+29a/320>
Trace: c0106000 <get_options+0/74>
Code: c0158892 <tcp_retrans_try_collapse+3a/208> 00000000 <_EIP>: <===
Code: c0158892 <tcp_retrans_try_collapse+3a/208> 0: 80 79 66 00 cmpb $0x0,0x66(%ecx) <===
Code: c0158896 <tcp_retrans_try_collapse+3e/208> 4: 74 11 je c01588a9 <tcp_retrans_try_collapse+51/208>
Code: c0158898 <tcp_retrans_try_collapse+40/208> 6: 8b 81 88 00 00 00 movl 0x88(%ecx),%eax
Code: c015889e <tcp_retrans_try_collapse+46/208> c: 83 38 01 cmpl $0x1,(%eax)
Code: c01588a1 <tcp_retrans_try_collapse+49/208> f: 0f 95 c0 setne %al
Code: c01588a4 <tcp_retrans_try_collapse+4c/208> 12: 25 ff 00 00 00 andl $0xff,%eax

Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!
In swapper task - not syncing

----------------------------------------

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jan 23 2000 - 21:00:21 EST