....
RedHat Linux 8....two IDE
80gig hard-drives in a Software RAID-1... they all crash, they all
run RedHat Linux, and they all use IDE.
We also get various IDE errors in the /var/log files, such as: (also another
weird IRQ error - except we're running a stock config! (ie/ no PCI devices
other than one NIC... No idea if they're related to the IDE thing <sigh>.
The IRQ thing appears to be USB related)
== during server runtime ==
kernel: hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
kernel: hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=150637065,
sector=150636992
kernel: end_request: I/O error, dev 16:01 (hdc), sector 150636992
kernel: hdc: dma_intr: status=0x53 { DriveReady SeekComplete Index Error }
kernel: hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=150630007,
sector=150629920
kernel: end_request: I/O error, dev 16:01 (hdc), sector 150629920
== during server boot ==
kernel: usb.c: new USB bus registered, assigned bus number 3
kernel: hub.c: USB hub found
kernel: hub.c: 2 ports detected
kernel: PCI: Found IRQ 3 for device 00:10.2
kernel: IRQ routing conflict for 00:10.0, have irq 9, want irq 3
kernel: IRQ routing conflict for 00:10.1, have irq 9, want irq 3
kernel: IRQ routing conflict for 00:10.2, have irq 9, want irq 3
kernel: IRQ routing conflict for 00:10.3, have irq 9, want irq 3
kernel: usb-uhci.c: USB UHCI at I/O 0x9400, IRQ 9
Usually after that happens (the IDE errors) it will crash fairly soon. Other
times no errors are logged but the server is *extremely* slow - likely due
to disk performance. My guess is that there is some sort of internal IDE
error (A "CorrectableError"?) that the kernel is recovering from and not
writing a message to the log.
Once and awhile after a major kernel panic or reboot, the system refused to
reboot at all, going into an endless cycle of disk checking. We tried
various brand-name ram suppliers in case it was a ram corruption - no luck.
Everything points to IDE.
We tried various motherboards, including the following chipsets:
VIA KT333
VIA KT400
SiS 745
The problem definately occurs on 2.4.20-19.8, but also some earlier kernel
versions as well (which I can't remember). It only happens during extremely
high disk usage (I've seen it fail with about 8% CPU and Memory Usage...)
It's not our database server - it runs fine on our older "BX-Boards" (the
older 300mhz intels) and on various Windows NT/2000 boxes. The configuration
of the database server is exactially the same as on the other servers (I
double checked at least 5 times...)
After about 4 months of random server crashes and corruption, various trials
and testing, I'm fairly certain it must be a hardware interaction between
the IDE hard-drives and the Linux kernel. We've gone through 20 IDE
hard-drives that work fine and look fine after the crashes - definately not
a physical hard-drive problem. Definately not a motherboard problem because
we've been through 4 different motherboards, different manufacturers and 3
different chipsets.
Ideas..? Help..? :( The clients are going to be banging down the doors any
day if we don't get operational servers, and so far the only solution that I
believe works 100% is to install Windows (doh). As a last resort I'm asking
to see if any of you Kernel-gurus have ideas :-)
Thanks,
-Eric Bickle
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/