Oops - hard crash in 2.2.15 - tcp_keepalive - more data

From: Whit Blauvelt (whit@transpect.com)
Date: Mon May 22 2000 - 18:13:18 EST


Hi all,

Yet another crash, with the system the same as in the crashes reported last
week except that I'm no longer running with the delack patch, but instead
using a patch that Oleg Drokin kindly sent me (see below) on the theory
that something was stomping on the memory used by tcp_keepalive. If I
understand his patch right, this proves that wasn't the problem.
Progress....

Ideas? I'll try anything to get this system stable. I'm guessing I'm using
kernel features that are not fully mutually compatible, but it's far from
my to sort out which. I'm not including any features in the kernel I don't
need in the circumstance, and it's a production system, so I'm not in a
great position to turn them off one-by-one - especially as it sometimes
takes a full week to crash (right up there with NT - no I didn't say that,
please forgive me if I did).

Whit

Unable to handle kernel NULL pointer dereference at virtual address 00000050
current->tss.cr3 = 00101000, %cr3 = 00101000
*pde = 00000000
Oops: 0000
CPU: 0
EIP: 0010:[<c01792c2>]
EFLAGS: 00010096
eax: 00000000 ebx: c30b3180 ecx: c30b33e8 edx: 000000fe
esi: 00000001 edi: 00000000 ebp: 0000bae1 esp: c0215ef8
ds: 0018 es: 0018 ss: 0018
Stack: 00000007 00000000 c02095b0 00000001 0000000c 01ec24f3 00000000 00000282
        00000000 c0179641 c02095b0 00000000 c0179610 00000001 c0215f48 c01113a9
        00000000 00000001 c0252384 00000000 c0215f60 c0117b99 00000000 c0240000
Call Trace: [<c0179641>] [<c0179610>] [<c01113a9>] [<c0117b99>] [<c010a2cd>] [<c0109f9c>] [<c01078a9>]
                [<c0106000>] [<c01078cc>] [<c01090fc>] [<c0106000>] [<c010607b>] [<c0106000>] [<c0100175>]
Code: 8b 40 50 ff d0 83 c4 10 83 fe 01 75 06 ff 0d 0c 41 25 c0 66

>>EIP: c01792c2 <tcp_keepalive+e6/18c>
Trace: c0179641 <tcp_sltimer_handler+31/70>
Trace: c0179610 <tcp_sltimer_handler+0/70>
Trace: c01113a9 <timer_bh+2e9/330>
Trace: c0117b99 <do_bottom_half+49/64>
Trace: c010a2cd <do_IRQ+39/40>
Trace: c0109f9c <common_interrupt+18/20>
Trace: c01078a9 <cpu_idle+61/70>
Trace: c0106000 <get_options+0/74>
Code: c01792c2 <tcp_keepalive+e6/18c> 00000000 <_EIP>: <===
Code: c01792c2 <tcp_keepalive+e6/18c> 0: 8b 40 50 mov 0x50(%eax),%eax <===
Code: c01792c5 <tcp_keepalive+e9/18c> 3: ff d0 call *%eax
Code: c01792c7 <tcp_keepalive+eb/18c> 5: 83 c4 10 add $0x10,%esp
Code: c01792ca <tcp_keepalive+ee/18c> 8: 83 fe 01 cmp $0x1,%esi
Code: c01792cd <tcp_keepalive+f1/18c> b: 75 06 jne c01792d5 <tcp_keepalive+f9/18c>
Code: c01792cf <tcp_keepalive+f3/18c> d: ff 0d 0c 41 25 c0 decl 0xc025410c
Code: c01792d5 <tcp_keepalive+f9/18c> 13: 66 data16

Aiee, killing interrupt handler
Kernel panic: Attempted to kill the idle task!
In swapper task - not syncing

On Thu, 18 May 2000 23:23:03 +0300, Oleg Drokin <green@tiger.thukraine.com> wrote:

> Well, try this patch,
> it compiles and even boots ;)
>
>
> --- net/ipv4/tcp_timer.c.orig Thu May 18 22:57:41 2000
> +++ net/ipv4/tcp_timer.c Thu May 18 23:10:44 2000
> @@ -21,6 +21,7 @@
> */
>
> #include <net/tcp.h>
> +#include <asm/system.h>
>
> int sysctl_tcp_syn_retries = TCP_SYN_RETRIES;
> int sysctl_tcp_keepalive_time = TCP_KEEPALIVE_TIME;
> @@ -361,8 +362,11 @@
> {
> static int chain_start = 0;
> int count = 0;
> + unsigned long flags;
> int i;
> -
> +
> + save_flags(flags);
> + cli();
> for(i = chain_start; i < (chain_start + ((tcp_ehash_size/2) >> 2)); i++) {
> struct sock *sk = tcp_ehash[i];
> while(sk) {
> @@ -377,6 +381,7 @@
> out:
> chain_start = ((chain_start + ((tcp_ehash_size/2)>>2)) &
> ((tcp_ehash_size/2) - 1));
> + restore_flags(flags);
> }
>
> /*
>
>
> Bye,
> Oleg
> --
> System Administrator
> Tank Hill Ukraine
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue May 23 2000 - 21:00:22 EST