Re: 2.0.0/2.0.25 oopses with buslogic 956C and two 4G seagates ...

Leonard N. Zubkoff (lnz@dandelion.com)
Sat, 16 Nov 1996 09:51:26 -0800


From: "Peter T. Breuer" <ptb@oboe.it.uc3m.es>
Date: Sat, 16 Nov 1996 14:55:36 +0000 (WET)

Here is the oops from my buslogic scsi machine.

> Nothing immediately comes to mind as broken about your configuration.
> Assuming you have compiled PCI support into the kernel, the BT-956C should
> not be reported as using I/O port 0x330. If it were using I/O port 0x330,
> it might be

Why not? Isn't it supposed to be? That is what the bios setup says!

Sorry, I should have been more precise. AutoSCSI will report that the ISA
Compatible I/O Port at 0x330 is being used, but the Linux driver will not. The
reason is that when the driver determines the PCI I/O Port from the PCI BIOS,
it then commands the card to disable the ISA Compatible I/O Port, partly to
avoid recognizing the same card twice, and partly to avoid unnecessary
conflicts with other devices. If PCI BIOS support is not available, then the
driver will only recognize the ISA Compatible I/O Port and you'd see 0x330 in
the driver's startup messages.

I am now using the 2.0.23 buslogic.bin slackware boot disk downloaded
from your home page (nice work!). Same result, but more error messages.
I'll type out what I can now see on the machine next to me ...

... 39.73 BogoMips (this is a 120MHz? Intel Pentium)
hda: FX001DE, ATAPI CDROM drive (configured as master)
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
Floppy Drive(s) fd0 is 1.44M
Started kswapd v 1.4.2.2
FDC 0 is a post-1991 82077
scsi0: Configuring Buslogic Model BT956C PCI Wide SCSI Host Adapter
Firmware Version: 4.28A, I/O Address: 0x6000, IRQ Channel: 11/Level

Note the I/O Address is listed as 0x6000 here, not 0x330.

PCI Bus: 0, Device: 13, Address: Unassigned, Host Adapter SCSI ID: 7
Parity Checking: Enabled, Extended Translation: Disabled
Synchronous Negotiation: Fast, Wide Negotiation: Enabled
Scatter/Gather Limit: 128 of 8192 segments, Mailboxes: 255
Driver Queue Depth: 255, Host Adapter Queue Depth: 100
Tagged Queue Depth: Automatic, Untagged Queue Depth: 3
Error Recovery Strategy: Default, SCSI Bus Reset: Enabled
SCSI Bus termination: Both Enabled

The above messages all look reasonable.

scsi0: CCB #0 to Target 0 Impossible State

Here's the real problem. This message indicates that the Completion Code field
of the CCB (Command Control Block) has a value that is impossible based on the
control flow of the driver. This pretty much indicates that something trashed
the CCB data structure, which also explains the OOPS that follows. No doubt
CCB->Command is invalid, and so the access to Command->scsi_done faults. Why
this should be occurring I have no idea. It may well be hardware.

This should be interpreted via your buslogic.bin System.map ... it does not
seem to be on the diskette. Hang on - I will search your web pages again.
Nope it's not there. Sorry - I can't do any better than this. I hope you can
match the assembler some other way.

Excellent point. I hadn't thought of that. I've now made the corresponding
System.map file available and placed a link on the web page. The fault occurs
near the end of BusLogic_ProcessCompletedCCBs as I expected.

I would guess that the fault is when init can't be started because there is no
disk! Can't we die gracefully? I am on a ramdisk. The stack seems to be
repeating. Maybe something called itself recursively and went out of the
stack "segment"? Would that be a gpf?

If the driver and BT-956C cannot communicate properly using CCBs and Mailboxes,
then no SCSI commands can be properly executed.

To rub salt in the wound, here is the FreeBSD dmesg for this machine (again).
Can anyone give me a clue?

I notice a couple of things from the FreeBSD messages. First, you seem to have
a UMC chipset. I don't know much about this particular UMC chipset in detail,
but my general experiences with UMC have not been good. You could try a Linux
kernel without PCI BIOS support and see if that helps; it would more closely
model the behavior of the FreeBSD driver, since it seems to know nothing of
PCI.

Date: Sat, 16 Nov 1996 13:35:07 GMT
From: "Peter T. Breuer" <ptb@oboe.it.uc3m.es>

In article <9611160025.AA06904@feral.com> you wrote:
: I have a BT958 with almost exactly the same other h/w as you,
: and I'm fine. I wonder what the difference between that and th 956 is?

For one thing, it moves. I walked in this morning and found the box wailing
for help. The monitor won't light and it just beeps on bootup. It was Ok at
12.30pm last night when it sent me its system stats via a cronjob (this is
under FreeBSD).

Well, if the machine is now dead to FreeBSD, then I suspect that you have a
hardware failure which just happened to manifest itself first on Linux.

Since it was dead, I opened up the box and took it apart. It looks to
me as if the seagates (ST15230W) have no jumpers of any sort on, which
according to the "manual", means that they are both terminated? But who
can understand the manual. Which particular subcase of which card I am
in is not exactly crystal clear. The cable goes from the BT-956C
controller to drive 0 to drive 1, so that looks like smoke-inhalation
time to me! (No - I did not do any installation on this machine after
it came here from the shop, but the technicians have been at it). I
suppose the controller is auto-terminated too? I don't have a a manual
for that. In any case - wouldn't I need to jumper the drives so that
they know which is 0 and which is 1? Or can they tell by position?

My driver reported that SCSI termination was enabled for both the high and low
bytes. Termination should also be enabled for drive 1, according to your
description. The host adapter is not auto-terminated. The BT-956C termination
is enabled/disabled using the AutoSCSI utility; the newer BT-958 does have
automatic smart termination. There must be jumpers somewhere on those drives
to set the target IDs; they cannot tell by position. The Seagate Web Site
should have diagrams of where to find the jumpers that control target ID and
other options.

If termination is not set correctly, it might well be possible that FreeBSD
would function and Linux would not. FreeBSD has no support for tagged queuing,
so it will never have more than one command outstanding to the drive at a time.
The Linux driver supports tagged queuing which allows for much higher
performance, but that also makes an incorrectly implemented or marginal SCSI
bus more problematic.

Any advice? I have never had to deal with scsi's before. Honest, when
I left last night all was well!

Try the usual experiments to determine why your hardware is dead, swapping
components out, etc. The beeps at startup may tell you which component is ill;
consult the motherboard manual for more information on that.

Leonard