Sudden 5.1 reboot problem / scsi kernel issue?

William W. Austin (bill@baustin.alph.att.com)
Fri, 16 Oct 1998 11:14:20 -0400


I posted this to the devel list (and to the generic list) and it was
suggested that I should post this here as well... I have just subscribed
to the linux-kernel list, and should be able to get replies there; however,
please feel free to write me privately if this is an old issue and you
don't want to waste bandwidth... -- Thanks

============================================================================

I have a system (233MHz P2, 64Mb ram, Matrox Mill II 250 MHz) which has
had 2 Adaptec controllers in it for months -- a 2940UW and a 2940U (pci
slots 1 and 2, respectively) -- and until today I have had no problems
whatsoever with the configuration.

Before proceeding, I last made a kernel modification about a month ago,
but I have rebooted (cold and hot) multiple times since then, and there have
been no hardware changes whatsoever since before the last kernel rebuild.

This morning I decided to do a reboot (I had to run some stupid M$ crap
program under Win95 on a different disk partition. sigh), and as root,
issued the reboot command. After doing this I looked at my watch, realized
that I did not have time to follow through before leaving for work, and
so let the reboot proceed normally to reboot linux (rather than win95).

HOWEVER, the machine proceeded 'normally' through recognizing the two SCSI
controllers (2940UW first, then 2940, along with several drives, tape drives,
cdrom, cdr, etc.) and THEN PROCEEDED TO RECOGNIZE THEM AGAIN AS CONTROLLERS
2 and 3. I copied the following out longhand from a stuck boot screen:

[[... normal stuff deleted]]

> scsi 2: Adaptec AHA274x/284x/294x (EISA/VLB/PCI - Fast SCSI) 5.0.19/3.2.2
> <Adaptec AHA 294x Ultra SCSI host adapter>
? scsi 3: Adaptec AHA274x/284x/294x (EISA/VLB/PCI - Fast SCSI) 5.0.19/3.2.2
> <Adaptec AHA 294x Ultra SCSI host adapter>

[[ NOTE: the 2 adapters have already been identified as scsi 0: and scsi 1:]]

> scsi: 4 hosts
> (scsi2:0:-1:-1) scanning channel for devices
> scsi: aborting connect due to divece timeout PID 58, s... 2, channel 0,
id0, lun 0 Test Unit Ready 00 00 00 00

> Kernel Panic: aic7xxx: AWAITING_MSG for an SCB that does not have a waiting
message
>
> In swapper task - not syncing

At this point the boot fails of course, and the system sits there dead in
the water.

I have been running the 2.0.35-2 kernel since (approx.) July 25th with no
changes. I did a regen of the kernel to (a) remove sound devices and (b) add
support for a multi=port digiboard (pci/xem); however, this has been running
smoothely for months.

I also did a verify of the kernel, the boot info, etc. against 2 known
good backup tapes (of same system) and they are byte-for-byte identical.

I can successfully boot using either boot disk which I made immediately
after going to 2.0.35-2 (from 2.0.34-1) or the one I made immediately after
the kernel recompile/install. However, even after running lilo again
(and recreating the initrd-2.0.35.img, the system still fails to boot
'normally' and continues to find not 2 but 4 scsi controllers.

As a check I brought up msdos and went to the adaptec diagnostics and both
controllers pass all tests there. For what it's worth (negative $0.02)
Windoze95 boots and runs normally on this hardware as if nothing had
ever happened.

So I am currently stumped on this one. Has anyone seen this behavior and
if so are there any suggestions on it (I have scanned the last 100 or so
digests and have not found anything related to this so far)?

Thanks in advance,

William W. Austin waustin@romulan.alph.att.com
770 750-6954 VOICE / 770 750-7321 FAX
===============================================================================
"Life is just a phase I'm going through..."

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/