Problems with 2.0.30 SMP and IDE

Jose Orlando Pereira (mesjop@di.uminho.pt)
Fri, 27 Jun 1997 15:47:27 +0100


Hi,

After upgrading an SMP machine (with 2xPPro200 out of 4,
Intel chipset, manufactured by ALR) from 2.0.27 to 2.0.30
messages like this started to appear in the logs:

Jun 17 16:56:38 homer kernel: hda: write_intr: status=0xd0 { Busy }
Jun 17 16:56:38 homer kernel: ide0: reset: success

and eventually it crashed and caused severe filesystem corruption.
(Obviously, I only found them after the crash...) Some people
depend on this machine, so I am not able to experiment with it
as much as I would like to.

Here is the full story: after three days and a sequence of about
a dozen messages like the first, this appeared:

Jun 20 06:33:49 homer kernel: hda: recal_intr: status=0xd0 { Busy }
Jun 20 06:33:49 homer kernel: ide0: reset: master: error (0x00?)
Jun 20 06:33:49 homer kernel: hda: status timeout: status=0xd0 { Busy }
Jun 20 06:33:49 homer kernel: end_request: I/O error, dev 03:01, sector
1007023
Jun 20 06:33:49 homer kernel: hda: no DRQ after issuing WRITE
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@status=0xd0
{ Busy }
Jun 20 06:33:49 homer kernel: hda: drive not ready for command

[more messages like the first]

Jun 20 07:54:24 homer kernel: hda: recal_intr: status=0xd0 { Busy }
Jun 20 07:54:24 homer kernel: ide0: reset: master: error (0x00?)
Jun 20 07:54:24 homer kernel: hda: status timeout: status=0xd0 { Busy }
Jun 20 07:54:24 homer kernel: end_request: I/O error, dev 03:03, sector
17759
Jun 20 07:54:24 homer kernel: hda: no DRQ after issuing WRITE
Jun 20 07:54:24 homer kernel: hda: status timeout: status=0xd0 { Busy }
Jun 20 07:54:24 homer kernel: hda: drive not ready for command
Jun 20 07:54:24 homer kernel: ide0: reset: success
Jun 20 07:57:09 homer kernel: EXT2-fs warning (device 03:01):
ext2_unlink: Deleting nonexistent file (132749), 0
Jun 20 07:57:09 homer kernel: EXT2-fs warning (device 03:01):
ext2_free_blocks: bit already cleared for block 534707

After three days more:

Jun 23 17:03:44 homer kernel: EXT2-fs warning (device 03:01):
ext2_free_blocks: bit already cleared for block 449458

several times for different blocks.

repeated several times.

Finally:

Jun 23 17:37:25 homer kernel: hda: write_intr: status=0xd0 { Busy }
Jun 23 17:37:26 homer kernel: ide0: reset: success
Jun 23 17:40:23 homer kernel: attempt to access beyond end of device
Jun 23 17:40:23 homer kernel: 03:04: rw=0, want=1500517841,
limit=2044224

[ more messages like the first ]

Jun 23 18:28:52 homer kernel: hda: status error: status=0x58 {
DriveReady SeekComplete DataRequest }
Jun 23 18:28:52 homer kernel: hda: drive not ready for command
Jun 23 18:28:53 homer kernel: hda: status timeout: status=0xd0 { Busy }
Jun 23 18:28:53 homer kernel: hda: no DRQ after issuing WRITE

[ more messages like the first ]

Jun 24 22:15:24 homer kernel: attempt to access beyond end of device
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@imit=2044224

More messages about "attempt to access beyond end of device" and
finally the crash. After reboot and fsck:

Jun 25 15:06:51 homer kernel: EXT2-fs warning (device 03:04):
empty_dir: bad directory (dir #30796) - no `.' or `..'

We eventually made a backup, made a new filesystem and restored.
No more problems ever since with 2.0.27.

At the same time, messages like these appear both with
2.0.27 and 2.0.30:

Jun 17 14:42:40 homer kernel: eth0: Ethernet frame spanned multiple
buffers,status 7fffceff!
Jun 17 14:42:40 homer kernel: eth0: Ethernet frame spanned multiple
buffers,status 06b281ae!

The card is a Digital DE500 and the Tulip driver is being usde as
a module. Can this be the cause?

If more details are needed (e.g. the full logs or hardware
details) I'll be glad to provide them.

Thanks in advance,

-- 
Jose Orlando Pereira
* jop@di.uminho.pt * http://gsd.di.uminho.pt/~jop *