2.1.96 Freeze

Mike Black (mblack@csihq.com)
Fri, 17 Apr 1998 07:16:31 -0400


I reported a freeze earlier on two different machines. I tried 2.1.96 again
this morning and it locked up again within about 10 minutes. Interestingly
enough it didn't all die at once. I had a telnet session open and the first
thing I noticed was the my e-mail client was timing out. The telnet session
was still working (hitting return gave a new prompt). But, as soon as I did
a "netstat -t" it died too. ALT-SYSRQ-P showed c01084d0 as the current EIP
and my System.map says:

c0108430 t hard_idle
c0108468 T sys_idle
c0108514 T machine_restart

So, I don't think this looks very suspicicious (it SHOULD be in idle --
yes?).

It looks to me like this is related to disk I/O. My e-mail client was
trying to send mail (i.e. file access). My telnet was NOT doing file access
until I gave it a command. I could switch consoles (no file I/O), but
couldn't login (need "login" program). Could do ALT-SYSRQ (no file I/O) but
couldn't "Umount" disks from ALT-SYSRQ (File I/O). Could reboot from
ALT-SYSRQ but could not do CTL-ALT-DEL (File I/O).

I think I'm seeing a pattern here.

Both of my systems (I saw the same type of lockup on a 2nd box) are both
SCSI based (no IDE although it is compiled in). Machine#1:

aic7xxx: <Adaptec AHA-294X Ultra SCSI host adapter> at PCI 5
aic7xxx: BIOS enabled, IO Port 0x6100, IO Mem 0xf1000000, IRQ a, Revision B
aic7xxx: Single Channel, SCSI ID 7, 16/255 SCBs, QFull 16, QMask 0x1f
scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 4.1/3.2
scsi : 1 host.
scsi0: Scanning channel A for devices.
Vendor: HP Model: C3725S Rev: 5153
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
Vendor: MICROP Model: 4421-07 0329SJ Rev: 0329
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdb at scsi0, channel 0, id 1, lun 0
Vendor: PLEXTOR Model: CD-ROM PX-6XCS Rev: 1.00
Type: CD-ROM ANSI SCSI revision: 02
Detected scsi CD-ROM sr0 at scsi0, channel 0, id 2, lun 0
scsi : detected 1 SCSI cdrom 2 SCSI disks total.
Uniform CD-ROM driver Revision: 2.12
SCSI device sda: hdwr sector= 512 bytes. Sectors= 4194058 [2047 MB] [2.0 GB]
SCSI device sdb: hdwr sector= 512 bytes. Sectors= 4193360 [2047 MB] [2.0 GB]

I do have RAID5 support compiled in on both machines but am only using it on
the 2nd one:

aic7xxx: <Adaptec AHA-294X Ultra SCSI host adapter> at PCI 5
aic7xxx: BIOS enabled, IO Port 0x6100, IO Mem 0xf1000000, IRQ 10, Revision B
aic7xxx: Single Channel, SCSI ID 7, 16/255 SCBs, QFull 16, QMask 0x1f
scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 4.1/3.2
scsi : 1 host.
scsi0: Scanning channel A for devices.
Vendor: HP Model: C3725S Rev: 5153
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
Vendor: SEAGATE Model: ST19171N Rev: 0019
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdb at scsi0, channel 0, id 1, lun 0
Vendor: SEAGATE Model: ST19171N Rev: 0024
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdc at scsi0, channel 0, id 2, lun 0
Vendor: MICROP Model: 4421-07 0329SJ Rev: 0329
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdd at scsi0, channel 0, id 3, lun 0
Vendor: MICROP Model: 4421-07 0329SJ Rev: 0329
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sde at scsi0, channel 0, id 4, lun 0
Vendor: MICROP Model: 4421-07 0329SJ Rev: 0329
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdf at scsi0, channel 0, id 5, lun 0
Vendor: ARCHIVE Model: Python 25501-XXX Rev: 5.45
Type: Sequential-Access ANSI SCSI revision: 02
Detected scsi tape st0 at scsi0, channel 0, id 6, lun 0
scsi : detected 1 SCSI tape 6 SCSI disks total.
SCSI device sda: hdwr sector= 512 bytes. Sectors= 4194058 [2047 MB] [2.0 GB]
SCSI device sdb: hdwr sector= 512 bytes. Sectors= 17783112 [8683 MB] [8.7
GB]
SCSI device sdc: hdwr sector= 512 bytes. Sectors= 17783112 [8683 MB] [8.7
GB]
SCSI device sdd: hdwr sector= 512 bytes. Sectors= 4193360 [2047 MB] [2.0 GB]
SCSI device sde: hdwr sector= 512 bytes. Sectors= 4193360 [2047 MB] [2.0 GB]
SCSI device sdf: hdwr sector= 512 bytes. Sectors= 4193360 [2047 MB] [2.0 GB]

As a side note -- Machine#2 has not run on any kernel since 2.1.89 (probably
RAID5 related -- it locks up solid). Machine#1 has worked on MOST of them
(I had the same lockup problems as other people reported on one or two
kernels).

So, machine#1 is now (and has been) running 2.1.95 and machine#2 is on
2.1.89. 2.1.96 does have wholesale changes to aic7xxx.c so I suppose I
could put the old aic7xxx.c back in and see if the probs go away.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu