Dancing in the streets with 2.0.27

Edward Welbon (welbon@bga.com)
Sun, 8 Dec 1996 14:50:53 -0600 (CST)


Sorry if this is a duplicate of a previous posting, my link died at
exactly the same time that I pressed send, it appeared from the log that
message did not make it out.

Howdy Uncle George,

Actually, I have been dancing ever since I installed the 2.0 series, they
have all worked quite well for me, in paticular, 2.0.27 is gem; I have
had no significant problems at all in what I do (2 way P6 system with 7
disks, tape, CDROM, scanner, four SCSI controllers etc.). The only
problems I have had was a failing WD IDE drive. The following are my
observations, they are worth at least what they cost you.

You really should include more details concerning the fail. Do you have a
NCR/Symbios controller on the mother board? Or is it a seperate adapter?
I have two NCR adapters gathering dust (Tyan 825). I don't like them,
they seem to be somewhat unforgiving and cause crashes (YMMV and no SCSI
adapter jihads please, this is just my personal observation, I could be
wrong about this for reasons I'll mention). If controller is a seperate
adapter, you should at least try another model.

Also, there are two forms of the NCR scsi support and in one of those
forms there is a choice for io-port or memory mapped control. Try the
choice that forces io-port control. It might be hard to get through a
compile if your disk system is failing though.

Lastly, and probably most importantly, _PERHAPS_ the disk itself is the
problem, this can cause scsi accesses to go bonkers. Remember, there are
two participants in a transfer, the target and the initiator. The disk
plays a major role in disk operation (don't mean to be patronizing, but
often the controller gets the blame when the disk itself is the problem).

One factor that can kill disks is heat, the 7200 rpm disks are
particularly bad (in general) though other disks are also bad (low power
dissipation has been a factor for my recent disk purchases). I have a
Seagate Baracuda 15150W that *must* be cooled by a fan dedicated to that
purpose.. As a rule of thumb, the disk must not be uncomfortably warm. If
you have ever stood near a rack with 100 or so disks in it on a cold day
or in a temperature controlled lab, you know what I mean, such a rack
makes a superb space heater.

Try

(1) taking the covers off of the box and putting a small ordinary house
fan near the disk (better if you can position the disk outside of the
box NOT RESTING ON CARPET BUT SUSPENDED SUCH THE THERE IS AIR FLOW
OVER THE ENTIRE OUTSIDE SURFACE (re-read that a few times).

(2) Putting the disk in another system or trying a different disk in your
system.

(3) using a different scsi card (if possible).

(4) backing up soon (sync soon, sync often 8-).

Remember, the smaller the enclosure, the worse in general its cooling
capability is. Particulary if the disk is very close to other devices
that likewise dissipate considerable power. Heat kills silicon. Most
boxes have too few fans.

I have found that running Bonnie or iozone continuously is a reasonably
good way to cause the disk to dissipate significant power to test the
cooling capabilities of its intended enclosure. Running multiple copies
of Bonnie or iozone in a continuous loop on widely seperated files (the
sum of the sizes of the files needs to be at least twice the available
ram) is usefull since this can force lots of arm motion (and hence causes
the files to dissipate lots of heat).

P.S. I am looking for something that can sense the temperature of a
"thing" and cause a fan to go on in response. I have an Exide poer supply
that has heat sensitive switches that enable a fan to run (I can't find
these switches locally though). In the best of all worlds, I would like a
way to linearly modulate the speed of the fans according to the
temperature of the monitored "thing". I am not really interested in
eating up an ISA slot to achieve this, the function is way too simple to
require CPU intervention.

I can easily cause a shutdown -f (via a serial port) using something like
powerd - instead of causing a relay to close on power out, I can cause it
close on high temperature, powerd wiil not know it is being lied to. One
could even simulate a low battery (given that this is indicated by a realy
closing) on the same cable that connects to the UPS if you don't mind a
little hardware hackery (this won't cause the power to go off, but it will
force the system into an idle mode which generally is a low power
consumption state).

On Sun, 8 Dec 1996, Uncle George wrote:

> Date: Sun, 08 Dec 1996 12:33:28 -0500
> From: Uncle George <gatgul@voicenet.com>
> To: torvalds@cs.helsinki.fi
> Cc: axp-list@redhat.com, linux-kernel@vger.rutgers.edu
> Subject: Re: Linux-2.0.27 and 2.1.14 ( dont dance yet )
>
>
> Linus Torvalds wrote:
> >
> >
> > Check it out, send me comments, and dance joyously in the streets,
> >
> > Linus
> >
> My Alpha/noname/pci33 scsi ( NCR ) controller does not seem to get along
> with the OS's mentioned above. I Lived with the prob in 2.0.14 for a
> long time, and recently upgraded. No Luck - scsi controller still
> crashes,
> and ( has so far ) taken the system with it.
>
> I'm Not sure what the problem, the messages go off the screen, so I cant
> observ what the first/initial complaint is.
> gat
>

Ed Welbon; welbon@bga.com;