Re: scsi-problem (phase change ?)

Hauke Johannknecht (ash@ash.ccc.de)
Sun, 22 Jun 1997 13:05:32 +0200 (MET DST)


On Sun, 22 Jun 1997, Gerard Roudier wrote:

> > ncr53c810-0-<0,0>: phase change 2-7 6@00249c20 resid=2.
> > ncr53c810-0-<0,0>: phase change 2-7 10@0024962c resid=4.
> The scsi controller saw a phase change from COMMAND phase to
> MESSAGE_IN phase, with some residual data of the SCSI COMMAND not
> accepted by the drive. If we exclude some problem in the DCRS drive,
> the most probable reason is some bad signal level on the SCSI bus that
> corrupted data or broke the scsi protocol.
urgl.
hmm, how can i describe my scsi-bus ...
i will go from one terminator to the other ...
first end (inside computer)
|--- Quantum LPS with terminators installed (passive ?) --->
---> IBM DCRS (no way to terminate this one ... <g>) ---->
---> NCR 810 --->
---> external cable (80cm) --->
---> external 4-device-box --->
---> Seagate ST1600N --->
---> Sanyo CD-ROM --->
---> HP 6020i --->
---> end of external box --->
---> external active terminator ---|
makes a total of no more than 2 meters ...

> We probably should expect such problems to be recovered, hewever,
> error recovery is very hard to implement and to test and, in any case,
> it is not possible in my opinion to recover from all kinds of errors.
perhaps i should try to avoid the error ... :)

> I think that mixing old and recent devices and devices with too different
> purpose and speed on the same SCSI bus, or connecting too many
> devices on the same SCSI bus increases the probability of SCSI problems.
all scsi-II ... the DCRS is ultra ... the other ones are not
even fast-scsi i guess ... (yes, i removed my Miniscribe
20M-HDD with external stepper and SCSI-I-CSS command set ... :)

> > seems to happen only if the system is running under
> > heavy load AND the ST is powered up some time ...
> Do you mean that you powered up the ST while the system is running?
yup. it is connected to the bus all the time, but normally
powered down (no 5 and 12 V ...) .. and after the "booting"
of the disk and an add-single-device everything worked for
more than an hour. and then --->> KABOOM !

> > (can an overheated hdd data-kill another one via the scsi-bus ?)
> Since the SCSI bus is a shared resource, any device on the bus can
> make the resource unusable.
i guess a scsi-bus-testing-device costs more than a new harddisk ? :)

> > - WHO is responsible ?
> Us.
> You, because your SCSI bus configuration looks like something that risks
> a lot to get problems, and if you used to switch you ST under heavy load.
> And me, if it is possible to recover from such errors.
smile, i thought of an answer like "blame it to the ST" ...
sorry if i offended you and the other programmers ... :)
ST is switched in an "kind-of-idle" state ... but as i said
before, it works for quite a long time after power-up !

> > - HOW can i stop it ?
> Trying to recover for such errors in the driver, if it is possible, would
> perhaps cure the consequence but not repair the system, if as I think
> your SCSI system (all components sharing the ressource) uses a mix that
> increases too much SCSI problem probability.
> It is better to try to fix the cause, in my opinion.
yup. my bet is the overheated ST ... i will try to
find out, if i am right.

> My recommendation is to use more than 1 scsi BUS and to distribute devices
> among buses in a way that will minimize the risk to get SCSI problems.
> 2 buses is generally enough for most systems.
> Base choice on speed, purpose, age, quality, etc.. of scsi devices.
> That cannot be bad, at least for performance when you are using 2 devices
> with very different speed at the same time.
smile, i would like to use a 3940, if you give me one for free ... :)

> As an example, here is my SCSI system description:
> - NCR53C810 that drives a IBM S12 narrow fast SCSI-2 HD and a Toshiba
> 3401D SCSI-2 CD/ROM.
> - NCR53C875 that drives an Atlas I Wide HD and an Atlas II Ultra Wide HD.
hmm, i could add a ST-02 for driving the ST1600N ... ;)

> All that stuff with a BUS as short as possible and only active
> terminations.
i think my bus is ok, just the drive drives it crazy ...

Anyways, thanx for your help.
now i at least know what is going wrong.

Gruss,
Hauke

-- 
Hauke Johannknecht               5johannk@informatik.uni-hamburg.de
-> Hamburg / Germany <- -                   tschechow@geocities.com 
http://www.geocities.com/TimesSquare/Arcade/9242/    ash@ash.ccc.de 
--> pgp-key via public-key-server and on request <--> use pgp ! <--