[PATCH] 2.[46] Fix double reset in aic7xxx driver

From: Joe Korty
Date: Mon Jul 12 2004 - 14:22:10 EST


Fix occasional PCI bus parity errors on the Dell PowerEdge 4600 during
boot.

Symptoms: The LCD display would turn orange and display "PCI SYSTEM
E13F5", and the following message would appear in /var/log/dmesg:
"Uhhuh. NMI received. Dazed and confused, but trying to continue".

By inserting a PCI card with a PDC20268 IDE controller and attaching to
that a Sony DRU-510A DVD RW burner with an unloaded tray, the failure
can be made to happen on every boot.

Cause: The aic7xxx driver was resetting the onboard AIC7891 SCSI controller
while waiting for a previous reset to complete. This second reset confuses
the controller causing it to put bad data onto the PCI bus.

This is a backport of a RedHat 2.4.21-15.ELsmp fix. A letter discussing
this problem, or one very close to it, may be found at:

http://lists.us.dell.com/pipermail/linux-poweredge/2003-May/025010.html

Against 2.6.7 and 2.4.26.

Regards,
Joe


diff -ura base/drivers/scsi/aic7xxx/aic79xx_pci.c new/drivers/scsi/aic7xxx/aic79xx_pci.c
--- base/drivers/scsi/aic7xxx/aic79xx_pci.c 2004-06-16 01:19:22.000000000 -0400
+++ new/drivers/scsi/aic7xxx/aic79xx_pci.c 2004-07-12 12:50:11.000000000 -0400
@@ -452,8 +452,10 @@
* or read prefetching could be initiated by the
* CPU or host bridge. Our device does not support
* either, so look for data corruption and/or flaged
- * PCI errors.
+ * PCI errors. First pause without causing another
+ * chip reset.
*/
+ hcntrl &= ~CHIPRST;
ahd_outb(ahd, HCNTRL, hcntrl|PAUSE);
while (ahd_is_paused(ahd) == 0)
;
diff -ura base/drivers/scsi/aic7xxx/aic7xxx_pci.c new/drivers/scsi/aic7xxx/aic7xxx_pci.c
--- base/drivers/scsi/aic7xxx/aic7xxx_pci.c 2004-06-16 01:18:57.000000000 -0400
+++ new/drivers/scsi/aic7xxx/aic7xxx_pci.c 2004-07-12 12:50:11.000000000 -0400
@@ -1284,8 +1284,10 @@
* or read prefetching could be initiated by the
* CPU or host bridge. Our device does not support
* either, so look for data corruption and/or flagged
- * PCI errors.
+ * PCI errors. First pause without causing another
+ * chip reset.
*/
+ hcntrl &= ~CHIPRST;
ahc_outb(ahc, HCNTRL, hcntrl|PAUSE);
while (ahc_is_paused(ahc) == 0)
;

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/