DMA blues... (Raid5 + Promise ATA/100)

From: Jasmeet Sidhu (jsidhu@arraycomm.com)
Date: Tue Feb 13 2001 - 00:51:50 EST


I have a software raid setup using the latest kernel, but the system keeps
crashing.

Each drive is connected to the respective ide port via ATA100 capable
cables. Each drive is a master..no slaves. The configuration is that
/dev/hdc is a hot spare and /dev/hd[e,g,i,k,m,o,q,s] are all setup as raid
5. These are all 75GB drives and are recognized as such.

I have searched the linux-kernel archives and saw many posts addressing the
problems that I was experiencing, namely the freezing caused by the code in
the body of delay_50ms() in drivers/ide/ide.c. This was fixed in the
current patch and as discussed earlier on the linux-kernel mailing list by
using mdelay(50). This fixed the problems to some extent, the system
seemed very reliable and I did not get a single "DriveStatusError BadCRC"
or a "DriveReady SeekComplete Index Error" for a while. But after I had
copied a large amount of data to the raid, about 17GB, the system crashed
completely and could only be recovered by a cold reboot. Before using the
latest patches, the system would usually crash after about 4-6GB of data
had been moved. Here are the log entries...

Feb 12 06:41:12 bertha kernel: hdo: dma_intr: status=0x53 { DriveReady
SeekComplete Index Error }
Feb 12 06:41:12 bertha kernel: hdo: dma_intr: error=0x84 { DriveStatusError
BadCRC }
Feb 12 06:45:42 bertha kernel: hdo: timeout waiting for DMA
Feb 12 06:45:42 bertha kernel: hdo: ide_dma_timeout: Lets do it again!stat
= 0x50, dma_stat = 0x20
Feb 12 06:45:42 bertha kernel: hdo: irq timeout: status=0x50 { DriveReady
SeekComplete }
Feb 12 06:45:42 bertha kernel: hdo: ide_set_handler: handler not null;
old=c01d0710, new=c01dac70
Feb 12 06:45:42 bertha kernel: bug: kernel timer added twice at c01d0585.
Feb 12 09:13:15 bertha syslogd 1.3-3: restart.

Let me know If I should post any additional information that might help in
troubleshooting. Is this a possible issue with the kernel code or is this
a problem with hardware? Any help is appreciated...

- Jasmeet Sidhu

Some other info that might help:

[root@bertha /root]# uname -a
Linux bertha 2.4.1-ac9 #1 Mon Feb 12 02:43:08 PST 2001 i686 unknown

Patches Applied to Kernel 2.4.1:
        1) ide.2.4.1-p8.all.01172001.patch
        2) patch-2.4.1-ac9

Asus A7V VIA KT133 and Onboard Promise ATA100 Controller (PDC20267)
1GHz AMD Thunderbird Athalon Processor
Four Promise PCI ATA100 Controllers (PDC20267)
Netgear GA620 Gigabit Ethernet Card

Boot Drive (Root + Swap)
hda: IBM-DTLA-307020, ATA DISK drive

Raid 5 Drives:
hdc: IBM-DTLA-307075, ATA DISK drive *SPARE*
hde: IBM-DTLA-307075, ATA DISK drive
hdg: IBM-DTLA-307075, ATA DISK drive
hdi: IBM-DTLA-307075, ATA DISK drive
hdk: IBM-DTLA-307075, ATA DISK drive
hdm: IBM-DTLA-307075, ATA DISK drive
hdo: IBM-DTLA-307075, ATA DISK drive
hdq: IBM-DTLA-307075, ATA DISK drive
hds: IBM-DTLA-307075, ATA DISK drive

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://vger.kernel.org/lkml/



This archive was generated by hypermail 2b29 : Thu Feb 15 2001 - 21:00:20 EST