Re: Repeated hard crashes of 2.2.* system related to tcp_keepalive

From: Andre Hedrick (andre@suse.com)
Date: Fri Feb 25 2000 - 02:19:23 EST


On Thu, 24 Feb 2000, Whit Blauvelt wrote:

> [1.] One line summary of the problem:
>
> Hard crashes of system related to tcp_keepalive
>
> [2.] Full description of the problem/report:
>
> The following frozen screen occurring between once a week and once a
> day (with only the most minor differences, has appeared running kernels
> 1.2.13, 1.2.14 and now 1.2.15pre9, through replacement of the memory,
> motherboard, CPU, brand of NICs, video card, and drive cables (what's
> constant now is the Tekram DC-390F SCSI controller, the IBM HD, a
> Western Digital HD used for /boot and swap, and the case and power supply):

Kurt Garloff can comment on the "Tekram DC-390F SCSI controller" and
remove that from the things in question.

> [3.] Keywords (i.e., modules, networking, kernel):
>
> tcp_keepalive crash
>
> [4.] Kernel version (from /proc/version):
>
> Linux version 2.2.15pre9 (root@china.patternbook.com) (gcc version
> egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)) #2 Mon Feb 21 00:01 :18
> EST 2000 (also 2.2.13 and 2.2.14 had identical problem)
>
> [5.] Output of Oops.. message (if applicable) with symbolic information
> resolved (see Documentation/oops-tracing.txt)
>
> Unable to handle kernel NULL pointer dereference at virtual address 00000050
> Oops: 0000
> CPU: 0
> EIP: 0010 [<c0177d5c>]
> EFLAGS: 000010296
> eax: 00000000 ebx: c2f83380 ecx: c2f835e8 edx: 000000c0
> esi: 00000001 edi: 00000000 ebp: 00001585 esp: c021fef8
> ds: 018 es: 0018 ss: 0018
> Stack: 00000007 00000000 c02124b0 00000001 00000002 00b859ca 00000000 c01780e6
> c02124b0 00000000 c01780b4 000000ca c021ff4c c0111271 00000000 00000001
> c025c640 00b859bc 00000001 00000000 c021e000 c021ff60 c0117991 00000000
> Call Trace: [<c01780e6>] [<c01780b4>] [<c0111271>] [<c0117991>] [<c010ald7>] [<c0109e38>] [<c0107859>]
> [<c0106000>] [<c010787c>] [<c0109000>] [<c0106000>] [<c010607b>] [<c0106000>] [<c0100176>]
> Code: 8b 40 50 ff d0 83 c4 10 83 fe 10 75 06 ff 0d 40 e5 25 c0 66
>
> >>EIP: c0177d5c <tcp_keepalive+e0/180>
> Trace: c01780e6 <tcp_sltimer_handler+32/70>
> Trace: c01780b4 <tcp_sltimer_handler+0/70>
> Trace: c0111271 <timer_bh+331/388>
> Trace: c0117991 <do_bottom_half+45/64>
> Trace: c0106000 <get_options+0/74>
> Code: c0177d5c <tcp_keepalive+e0/180> 00000000 <_EIP>: <===
> Code: c0177d5c <tcp_keepalive+e0/180> 0: 8b 40 50 mov 0x50(%eax),%eax <===
> Code: c0177d5f <tcp_keepalive+e3/180> 3: ff d0 call *%eax
> Code: c0177d61 <tcp_keepalive+e5/180> 5: 83 c4 10 add $0x10,%esp
> Code: c0177d64 <tcp_keepalive+e8/180> 8: 83 fe 10 cmp $0x10,%esi
> Code: c0177d67 <tcp_keepalive+eb/180> b: 75 06 jne c0177d6f <tcp_keepalive+f3/180>
> Code: c0177d69 <tcp_keepalive+ed/180> d: ff 0d 40 e5 25 c0 decl 0xc025e540
> Code: c0177d6f <tcp_keepalive+f3/180> 13: 66 data16
>
> Aiee, killing interrupt handler
> Kernel panic: Attemted to kill the idle task:
> In swapper task - not syncing
>
> [6.] A small shell script or example program which triggers the
> problem (if possible)
>
> Not a clue. Happens at different times of day, in different load
> conditions.
>
> [7.] Environment
> [7.1.] Software (add the output of the ver_linux script here)
>
> Kernel modules 2.3.9
> Gnu C egcs-2.91.66
> Binutils 2.9.4.0.6
> Linux C Library 2.1.1
> Dynamic linker ldd (GNU libc) 2.1.1
> Procps 2.0.2
> Mount 2.9o
> Net-tools 1.52
> Console-tools 1999.03.02
> Sh-utils 1.16
> Modules Loaded 3c59x
>
> [7.2.] Processor information (from /proc/cpuinfo):
>
> processor : 0
> vendor_id : AuthenticAMD
> cpu family : 5
> model : 8
> model name : AMD-K6(tm) 3D processor
> stepping : 12
> cpu MHz : 367.512290 (it's a 450 but I'm clocking it slow)
> cache size : 64 KB
> fdiv_bug : no
> hlt_bug : no
> sep_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 1
> wp : yes
> flags : fpu vme de pse tsc msr mce cx8 sep mtrr pge mmx 3dnow
> bogomips : 732.36
>
> Note this also was happenning with an AMD 233
>
> [7.3.] Module information (from /proc/modules):
>
> 3c59x 19332 2 (autoclean)
>
> Note this was also happenning with 2 LinkSys (tulip) NICs rather than the 2
> 3Com now in use
>
> [7.4.] SCSI information (from /proc/scsi/scsi)
>
> General information:
> Chip sym53c875E, device id 0xf, revision id 0x26
> IO port address 0xc000, IRQ number 5
> Using memory mapped IO at virtual address 0xc4000000
> Synchronous period factor 12, max commands per lun 32
>
> [7.5.] Other information that might be relevant to the problem
> (please look in /proc and include all information that you
> think to be relevant):
>
> No idea what would be relevant in proc. The system is a DSL connected
> firewall with masquerading for two other systems (although it fails
> whether they are active or off), running DNS (bind 8.2.2 5), Apache,
> Proftpd, sendmail.
>
> [X.] Other notes, patches, fixes, workarounds:
>
> Nope. All suggestions welcome. I'm ready to climb a tree and not come
> down again.
>
> Whit Blauvelt
> whit@transpect.com
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
>

Andre Hedrick
The Linux ATA/IDE guy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Feb 29 2000 - 21:00:11 EST