PDC20265 ide_dma_timeout and RAID5 issues (2.4.17)

From: Henrique de Moraes Holschuh (hmh@rcm.org.br)
Date: Wed Dec 26 2001 - 09:06:17 EST


I've been trying to get the Linux software RAID to work with two QUANTUM
FIREBALLP AS40.0 drives. Each HD is connected as the master unit of one of
the channels of a Promise PDC20265 controller (ASUS A7V [not A7V133!] board,
newest BIOS). The connection is done using 80-wire IDE cable, without any
slave devices.

Using dd for read and write is fine, and so is RAID1. However, if (and only
if) I try to read from a RAID5 device using the two HDs (3 disk raid,
running in degraded mode), the system loses sync with the PDC20265
controller, and starts spilling out DMA errors, and interrupt lost errors.
It requires a SYSRQ-assisted sync+boot to recover the system.

Here is the error log:
# dd if=/dev/md3 of=/dev/null
raid5: switching cache buffer size, 4096 --> 1024
hdg: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hdg: status error: status=0x00 { }
hdg: drive not ready for command
hdg: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hdg: read_intr: error=0x04 { DriveStatusError }
hdg: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hdg: read_intr: error=0x04 { DriveStatusError }
hdg: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hdg: read_intr: error=0x04 { DriveStatusError }
ide3: reset: success
hdg: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hdg: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hde: timeout waiting for DMA
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hde: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hde: read_intr: error=0x04 { DriveStatusError }
hde: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hde: read_intr: error=0x04 { DriveStatusError }
hde: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hde: read_intr: error=0x04 { DriveStatusError }
hde: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hde: read_intr: error=0x04 { DriveStatusError }
ide2: reset: success
hdg: lost interrupt
hdg: lost interrupt
hdg: lost interrupt
hdg: lost interrupt

The same problem also happens with 2.2.20 + IDE patches + New RAID patches.
It is very interesting that the RAID1 profile does not trigger the bug, and
doing all sort of parallel reads using dd will not trigger the bug either.
Only RAID5 seems to be able to trigger it.

Kernel is 2.4.17, with the improved K7+VIA "Athlon bug stomper" patch, plus
Debian patches (bug also shows up without K7 patch). Attached is also the
lspci -v output for this machine, and bootup log.

Any ideas on how to fix this one? I will gladly help to debug and test
patches for this issue...

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Dec 31 2001 - 21:00:11 EST