Re: Sporadious hang on 2.0.3[0,1,2,3,4pre2]

Bjarni R. Einarsson (bre@mmedia.is)
Wed, 4 Mar 1998 16:54:39 +0000


> From: Keith Rowland <keithr@primenet.com>
> Date: Wed, 04 Mar 1998 03:08:28 -0700
> Subject: Re: Sporadious hang on 2.0.3[0,1,2,3,4pre2]
>
> Daniel Ryde wrote:
> >
> > I have several "nightmare" hangs (total freeze, dead keyboard, ither
> > black or freezed normal screen) on six dial-in terminalservers here. It
> > has been like this since I left kernel 2.0.29. It can happen once a month
> > up to twice a day, under low loadavg, from a few users up to 25 users.
> > Nothing in the logs whatsoever.
>
> Welcome to the club Daniel. There are many of us who have this unstable
> stable release. I have a high volume web server, actually 4 different
> ones, that ALL exhibit this behavior. Let me assure you, it is nothing
> that you are doing specifically. It is something either in the kernel or
> some common configuration or user program.
>
> I have been fighting this for 3 months. Initially I thought it was
> something I did and have not been able to figure out anything that I am
> doing wrong and have given up the fight to those better than I.

Add me to the list.

I am running 3 dial-in servers (w/ Cyclades boxen), a web server, a high
through-put proxy server a backup/masquerade box, a mail server and shell
account machine, all on Linux.

Uptimes are as follows, along with the reasons for the last reboot and the
hardware differences.

dialin: 3 days, froze Cyrix686, Cyclades, IDE disk
dialin: 3 days, memory leak Cyrix686, Cyclades, IDE disk
dialin: 34 days, maintenance Intel P75, Cyclades, IDE disk
mail: 7 days, froze Intel P75, IDE disks + SCSI disk
web: 14 days, froze 486 (?), IDE disk
backup: 83 days, upgraded 486 (?), IDE disks, SCSI tape
shell: 39 days, normal Intel 80486, IDE disk
proxy: 18 days, hw upgrade Intel P75, IDE disk, 4 SCSI disks
other: 16 days, normal 486 (?), IDE disk

Most of these boxes are running 2.0.32pre6 (most with same build even!),
except for the "other" one which is running 2.0.33 and the modem box which
froze the most recently, I upgraded it to 2.0.33 after the freeze. All
boxes run named and xntpd. None are SMP. All have 3com network cards, most
3c509, some older models.

The dialin boxes have been the most difficult. They are also very busy
machines, I have the following things running on them:

+ Answering calls, providing PPP or slirp (slip) to each user (max
64 users at a time, per box.).
+ IP masquerading.
+ Transparent WWW proxy.
+ IP accounting (tables modified on each hangup/login!).

Of course, running the PPP servers off 2.0.32 (with the routing memory leak
bug) may seem silly, but I've been reading about people's troubles with
2.0.33, and basically am happier with a "known" bug than an unknown one..
but I guess it's safe to say I'm having troubles anyway. I'm running some
pretty old shared libraries, because I haven't gotten around to upgrading.

Before we blame the masquerading stuff.. note that the box with the highest
uptime is also masquerading and transparently proxying for all my ISDN
customers and the company's LAN. It's called a "backup box" both because it
spools backups of the other servers to tape & disk, and also because it
takes over when any of the other servers crashes. I'm very happy it has
been so stable itself. The routing tables and IP accounting tables are very
static on this box, and it is providing no user-space service to the outside
world.

The web server occasionally dies. The mail server dies slightly more
frequently. Neither of these boxes is doing any funky routing, accounting,
or masquerading. The proxy and backup boxes have been very stable.

I can't really see much of a pattern.. but I'd be happy to describe things
in more detail if people think it will help. Well.. mayby the boxes dying
are doing more TCP/IP stuff in user space than the others (not counting the
proxy, which is probable the most busy of them all!?).

These machines were on a network with a bunch of (even more) unstable Win95
boxes, a few NT servers and some Cisco routers. The Win95 boxes just moved
to a different network (all but one), so if my machines stop crashing I'll
attribute it all to bad influences.. ;-) I doubt that will happen though.
Everything important is on UPS.

Here's the 2.0.32pre6 config for the stable backup box.. I'm afraid I've
overwritten my config for the other ones.

The main differences are: the proxy and mail servers are both using the
AIC7xxx SCSI driver. The modem boxes are the same without the SCSI stuff.

#
# Automatically generated make config: don't edit
#

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y

#
# Loadable module support
#
CONFIG_MODULES=y
# CONFIG_MODVERSIONS is not set
CONFIG_KERNELD=y

#
# General setup
#
# CONFIG_MATH_EMULATION is not set
CONFIG_NET=y
# CONFIG_MAX_16M is not set
CONFIG_PCI=y
# CONFIG_PCI_OPTIMIZE is not set
CONFIG_SYSVIPC=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_JAVA=y
CONFIG_KERNEL_ELF=y
# CONFIG_M386 is not set
# CONFIG_M486 is not set
CONFIG_M586=y
# CONFIG_M686 is not set

#
# Floppy, IDE, and other block devices
#
CONFIG_BLK_DEV_FD=y
CONFIG_BLK_DEV_IDE=y

#
# Please see Documentation/ide.txt for help/info on IDE drives
#
# CONFIG_BLK_DEV_HD_IDE is not set
# CONFIG_BLK_DEV_IDECD is not set
# CONFIG_BLK_DEV_IDETAPE is not set
# CONFIG_BLK_DEV_IDEFLOPPY is not set
# CONFIG_BLK_DEV_IDESCSI is not set
# CONFIG_BLK_DEV_IDE_PCMCIA is not set
CONFIG_BLK_DEV_CMD640=y
# CONFIG_BLK_DEV_CMD640_ENHANCED is not set
CONFIG_BLK_DEV_RZ1000=y
# CONFIG_BLK_DEV_TRITON is not set
# CONFIG_IDE_CHIPSETS is not set

#
# Additional Block Devices
#
CONFIG_BLK_DEV_LOOP=m
# CONFIG_BLK_DEV_MD is not set
CONFIG_BLK_DEV_RAM=y
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_BLK_DEV_XD is not set
# CONFIG_BLK_DEV_HD is not set

#
# Networking options
#
CONFIG_FIREWALL=y
CONFIG_NET_ALIAS=y
CONFIG_INET=y
CONFIG_IP_FORWARD=y
# CONFIG_IP_MULTICAST is not set
CONFIG_SYN_COOKIES=y
CONFIG_RST_COOKIES=y
CONFIG_IP_FIREWALL=y
# CONFIG_IP_FIREWALL_VERBOSE is not set
CONFIG_IP_MASQUERADE=y

#
# Protocol-specific masquerading support will be built as modules.
#
CONFIG_IP_MASQUERADE_IPAUTOFW=y
CONFIG_IP_MASQUERADE_ICMP=y
CONFIG_IP_TRANSPARENT_PROXY=y
CONFIG_IP_ALWAYS_DEFRAG=y
CONFIG_IP_ACCT=y
# CONFIG_IP_ROUTER is not set
CONFIG_NET_IPIP=m
CONFIG_IP_ALIAS=y

#
# (it is safe to leave these untouched)
#
# CONFIG_INET_PCTCP is not set
# CONFIG_INET_RARP is not set
# CONFIG_NO_PATH_MTU_DISCOVERY is not set
CONFIG_IP_NOSR=y
CONFIG_SKB_LARGE=y

#
#
#
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_AX25 is not set
# CONFIG_BRIDGE is not set
# CONFIG_NETLINK is not set

#
# SCSI support
#
CONFIG_SCSI=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
CONFIG_CHR_DEV_ST=y
CONFIG_BLK_DEV_SR=m
CONFIG_CHR_DEV_SG=m

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
# CONFIG_SCSI_MULTI_LUN is not set
CONFIG_SCSI_CONSTANTS=y

#
# SCSI low-level drivers
#
# CONFIG_SCSI_7000FASST is not set
CONFIG_SCSI_AHA152X=m
CONFIG_SCSI_AHA1542=m
# CONFIG_SCSI_AHA1740 is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_IN2000 is not set
# CONFIG_SCSI_AM53C974 is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_DTC3280 is not set
# CONFIG_SCSI_EATA_DMA is not set
# CONFIG_SCSI_EATA_PIO is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
CONFIG_SCSI_GENERIC_NCR5380=m
# CONFIG_SCSI_GENERIC_NCR53C400 is not set
CONFIG_SCSI_G_NCR5380_PORT=y
# CONFIG_SCSI_G_NCR5380_MEM is not set
# CONFIG_SCSI_NCR53C406A is not set
CONFIG_SCSI_NCR53C7xx=y
# CONFIG_SCSI_NCR53C7xx_sync is not set
# CONFIG_SCSI_NCR53C7xx_FAST is not set
# CONFIG_SCSI_NCR53C7xx_DISCONNECT is not set
CONFIG_SCSI_PPA=m
# CONFIG_SCSI_PAS16 is not set
# CONFIG_SCSI_QLOGIC_FAS is not set
# CONFIG_SCSI_QLOGIC_ISP is not set
# CONFIG_SCSI_SEAGATE is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_T128 is not set
# CONFIG_SCSI_U14_34F is not set
# CONFIG_SCSI_ULTRASTOR is not set
# CONFIG_SCSI_GDTH is not set

#
# Network device support
#
CONFIG_NETDEVICES=y
CONFIG_DUMMY=y
# CONFIG_EQUALIZER is not set
# CONFIG_DLCI is not set
# CONFIG_PLIP is not set
CONFIG_PPP=y

#
# CCP compressors for PPP are only built as modules.
#
CONFIG_SLIP=m
CONFIG_SLIP_COMPRESSED=y
# CONFIG_SLIP_SMART is not set
# CONFIG_SLIP_MODE_SLIP6 is not set
# CONFIG_NET_RADIO is not set
CONFIG_NET_ETHERNET=y
CONFIG_NET_VENDOR_3COM=y
CONFIG_EL1=y
CONFIG_EL2=y
CONFIG_ELPLUS=y
CONFIG_EL16=y
CONFIG_EL3=y
CONFIG_VORTEX=m
# CONFIG_LANCE is not set
CONFIG_NET_VENDOR_SMC=y
# CONFIG_WD80x3 is not set
CONFIG_ULTRA=m
# CONFIG_ULTRA32 is not set
# CONFIG_SMC9194 is not set
CONFIG_NET_ISA=y
# CONFIG_AT1700 is not set
# CONFIG_E2100 is not set
CONFIG_DEPCA=m
CONFIG_EWRK3=m
# CONFIG_EEXPRESS is not set
# CONFIG_EEXPRESS_PRO is not set
# CONFIG_FMV18X is not set
# CONFIG_HPLAN_PLUS is not set
# CONFIG_HPLAN is not set
# CONFIG_HP100 is not set
# CONFIG_ETH16I is not set
CONFIG_NE2000=m
# CONFIG_NI52 is not set
# CONFIG_NI65 is not set
# CONFIG_SEEQ8005 is not set
# CONFIG_SK_G16 is not set
# CONFIG_NET_EISA is not set
# CONFIG_NET_POCKET is not set
# CONFIG_TR is not set
# CONFIG_FDDI is not set
# CONFIG_ARCNET is not set

#
# ISDN subsystem
#
# CONFIG_ISDN is not set

#
# CD-ROM drivers (not for SCSI or IDE/ATAPI drives)
#
# CONFIG_CD_NO_IDESCSI is not set

#
# Filesystems
#
CONFIG_QUOTA=y
CONFIG_MINIX_FS=m
# CONFIG_EXT_FS is not set
CONFIG_EXT2_FS=y
# CONFIG_XIA_FS is not set
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=m
CONFIG_UMSDOS_FS=m
CONFIG_PROC_FS=y
CONFIG_NFS_FS=y
# CONFIG_ROOT_NFS is not set
CONFIG_SMB_FS=m
CONFIG_SMB_WIN95=y
CONFIG_ISO9660_FS=m
CONFIG_HPFS_FS=m
# CONFIG_SYSV_FS is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_AFFS_FS is not set
CONFIG_UFS_FS=m
# CONFIG_BSD_DISKLABEL is not set
CONFIG_SMD_DISKLABEL=y

#
# Character devices
#
CONFIG_SERIAL=y
# CONFIG_DIGI is not set
CONFIG_CYCLADES=m
# CONFIG_STALDRV is not set
# CONFIG_RISCOM8 is not set
# CONFIG_PRINTER is not set
# CONFIG_SPECIALIX is not set
# CONFIG_MOUSE is not set
# CONFIG_UMISC is not set
# CONFIG_QIC02_TAPE is not set
# CONFIG_FTAPE is not set
# CONFIG_APM is not set
# CONFIG_WATCHDOG is not set
# CONFIG_RTC is not set

#
# Sound
#
# CONFIG_SOUND is not set

#
# Kernel hacking
#
# CONFIG_PROFILE is not set

-- 
Bjarni R. Einarsson
 bre@margmidlun.is               [ THIS SPACE INTENTIONALLY LEFT BLANK ]
 http://www.mmedia.is/~bre
 Juggler@IRC                       "I have only one question left..."
 

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu