Re: [SMP patch] io-apic-patch-2.1.97-A

Robert HYATT (hyatt@cis.uab.edu)
Sun, 19 Apr 1998 15:08:03 -0500 (CDT)


sorry... when I said "kerneld" I wasn't thinking... I meant either
/var/log/messages or /var/log/syslog or wherever you have your syslog.conf
set to send kernel messages...

Bob

Robert Hyatt Computer and Information Sciences
hyatt@cis.uab.edu University of Alabama at Birmingham
(205) 934-2213 115A Campbell Hall, UAB Station
(205) 934-5473 FAX Birmingham, AL 35294-1170

On Sun, 19 Apr 1998, Bill Broadhurst wrote:

> On Sun, Apr 19, 1998 at 12:28:32PM -0500, Robert HYATT wrote:
> >
> > On Sun, 19 Apr 1998, Bill Broadhurst wrote:
> >
> > >
> > > 1. Processes mysteriously die during a long (more than 2 hour)
> > > comple. The dead process can't be killed and shows 'D' in
> > > ps. If that process happens to be linked to a device, that
> > > device is continually 'busy' and only a reboot can free it.
> > > Any other process that touches the 'busy' device also dies.
> > > If this device is the root file system, the system processes
> > > will also (eventually) die, leaving the system in a "hung"
> > > state. No net access is possible and the Magic SysRq keys
> > > work (sorta) most of the time. <Magic-boot usually does.>
> > > This happens 100% of the time during a long modeling program or
> > > when re-compiling the entire X package.
> > >
> > > This symptom started at 2.1.85 and continues. It happens sooner on
> > > kernels and later on others. It happens on all my systems. I'll list
> > > them later.
> >
> >
> > you should check the kerneld logfile, to see if you see any sort of
> > message like "eth0: TX timed out..." I get this and my machine hangs,
> > but not "hard". No net traffic, can't start new processes nor exit
> > old ones, but somethings "sorta" work for a bit...
>
> No Kerneld logfile. No kerneld. No kmod. I don't use dynamic module
> loading. All modules here are loaded at boot time by insmod and
> remain loaded. Soon I'll put all the modules back into the kernel.
>
> >
> > I've tracked this to high ethernet traffic blowing out the ethpro100
> > driver, getting it into a state from which it can't recover. It's been
> > there a *long* time, but got *really* bad in 96-pre1 and 96, although I
> > can't try 97 until tomorrow sometime... unless I build and boot from
> > home tonite (I hate doing this because if the boot hangs, it's a 20+
> > mile drive in to my office to unhang it..)
> >
>
> Only one Intel NIC and that's on a low-traffic system as I had the same
> troubles with the driver as did you. No Ethernet hangs at all.
>
> >
> > >
> > > 2. On 2.1.96/7 the system will hang *hard* during a tape backup to
> > > the SCSI tapes. This also happens 100% of the time on the units with
> > > tape drives. (Also doesn't matter whether the drive being backed up
> > > is local or on another machine on the net. All tapes are on
> > > BusLogic BT930 controllers of various vintages. I did move one to
> > > my remaining Adaptec controller but it does the same thing but much
> > > further into the tape.
> > >
> >
> > SCSI has a definite problem in 2.1.96. I found I could not copy a large
> > file from one SCSI drive to another (large=200-500mb) without the machine
> > hanging *hard*. I backed up to 2.1.85 (the only older kernel I happened
> > to have saved in a handy place) and the SCSI copies went perfectly with
> > no problems at all. So something is "up" in 96 for certain, at least with
> > the combination of the bt958 SCSI and etherexpress Pro 100 ethernet
> > card. Sendmail would forward me a few email messages, it would hang. I'd
> > try to ftp a 20mb file from one machine to my quad processor (on a
> > switched hub which provides good thruput) and it would hang... And then
> > I found I couldn't even copy files from one drive to another reliably...
> >
>
> Agreed. And it continues into 2.1.97. I just hung the system *hard*
> by copying a 19M log file.
>
> >
> >
> >
> > > As noted, this symptom began, (I think) at 2.1.96 but I'll have to
> > > verify this as most backups have been made on a non-intel machine over
> > > the net since 2.1.88. I did a restore of about 200M of files on this
> > > machine under 2.1.95 without incident.
> > >
> >
>
> -bb
>
> --
> ----------------------------------------------------------------------
> Dr. Bill Broadhurst | Independent contract Engineer.
> (619)296-3710 | BIOS, Firmware, & Diagnostics.
> bbroad@CX492564-a.dt1.sdca.home.com | Finger for PGP 5.0 public key.
> ----------------------------------------------------------------------
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu