After upgrading an SMP machine (with 2xPPro200 out of 4,
Intel chipset, manufactured by ALR) from 2.0.27 to 2.0.30
messages like this started to appear in the logs:
Jun 17 16:56:38 homer kernel: hda: write_intr: status=0xd0 { Busy }
Jun 17 16:56:38 homer kernel: ide0: reset: success
and eventually it crashed and caused severe filesystem corruption.
(Obviously, I only found them after the crash...) Some people
depend on this machine, so I am not able to experiment with it
as much as I would like to.
Here is the full story: after three days and a sequence of about
a dozen messages like the first, this appeared:
Jun 20 06:33:49 homer kernel: hda: recal_intr: status=0xd0 { Busy }
Jun 20 06:33:49 homer kernel: ide0: reset: master: error (0x00?)
Jun 20 06:33:49 homer kernel: hda: status timeout: status=0xd0 { Busy }
Jun 20 06:33:49 homer kernel: end_request: I/O error, dev 03:01, sector
1007023
Jun 20 06:33:49 homer kernel: hda: no DRQ after issuing WRITE
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@status=0xd0
{ Busy }
Jun 20 06:33:49 homer kernel: hda: drive not ready for command
[more messages like the first]
Jun 20 07:54:24 homer kernel: hda: recal_intr: status=0xd0 { Busy }
Jun 20 07:54:24 homer kernel: ide0: reset: master: error (0x00?)
Jun 20 07:54:24 homer kernel: hda: status timeout: status=0xd0 { Busy }
Jun 20 07:54:24 homer kernel: end_request: I/O error, dev 03:03, sector
17759
Jun 20 07:54:24 homer kernel: hda: no DRQ after issuing WRITE
Jun 20 07:54:24 homer kernel: hda: status timeout: status=0xd0 { Busy }
Jun 20 07:54:24 homer kernel: hda: drive not ready for command
Jun 20 07:54:24 homer kernel: ide0: reset: success
Jun 20 07:57:09 homer kernel: EXT2-fs warning (device 03:01):
ext2_unlink: Deleting nonexistent file (132749), 0
Jun 20 07:57:09 homer kernel: EXT2-fs warning (device 03:01):
ext2_free_blocks: bit already cleared for block 534707
After three days more:
Jun 23 17:03:44 homer kernel: EXT2-fs warning (device 03:01):
ext2_free_blocks: bit already cleared for block 449458
several times for different blocks.
repeated several times.
Finally:
Jun 23 17:37:25 homer kernel: hda: write_intr: status=0xd0 { Busy }
Jun 23 17:37:26 homer kernel: ide0: reset: success
Jun 23 17:40:23 homer kernel: attempt to access beyond end of device
Jun 23 17:40:23 homer kernel: 03:04: rw=0, want=1500517841,
limit=2044224
[ more messages like the first ]
Jun 23 18:28:52 homer kernel: hda: status error: status=0x58 {
DriveReady SeekComplete DataRequest }
Jun 23 18:28:52 homer kernel: hda: drive not ready for command
Jun 23 18:28:53 homer kernel: hda: status timeout: status=0xd0 { Busy }
Jun 23 18:28:53 homer kernel: hda: no DRQ after issuing WRITE
[ more messages like the first ]
Jun 24 22:15:24 homer kernel: attempt to access beyond end of device
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@imit=2044224
More messages about "attempt to access beyond end of device" and
finally the crash. After reboot and fsck:
Jun 25 15:06:51 homer kernel: EXT2-fs warning (device 03:04):
empty_dir: bad directory (dir #30796) - no `.' or `..'
We eventually made a backup, made a new filesystem and restored.
No more problems ever since with 2.0.27.
At the same time, messages like these appear both with
2.0.27 and 2.0.30:
Jun 17 14:42:40 homer kernel: eth0: Ethernet frame spanned multiple
buffers,status 7fffceff!
Jun 17 14:42:40 homer kernel: eth0: Ethernet frame spanned multiple
buffers,status 06b281ae!
The card is a Digital DE500 and the Tulip driver is being usde as
a module. Can this be the cause?
If more details are needed (e.g. the full logs or hardware
details) I'll be glad to provide them.
Thanks in advance,
-- Jose Orlando Pereira * jop@di.uminho.pt * http://gsd.di.uminho.pt/~jop *