Re: 2.6.18-rc2 Intermittent failures to detect sata disks

From: Keith Owens
Date: Tue Jul 25 2006 - 02:27:26 EST


Jeff Garzik (on Tue, 25 Jul 2006 01:57:08 -0400) wrote:
>Keith Owens wrote:
>> Keith Owens (on Fri, 21 Jul 2006 16:18:47 +1000) wrote:
>>> I am seeing an intermittent failures to detect sata disks on
>>> 2.6.18-rc2. Dell SC1425, PIIX chipset, gcc 4.1.0 (opensuse 10.1).
>>> Sometimes it will detect both disks, sometimes only one, sometimes none
>>> at all. AFAICT it only occurs after a soft reboot, and possibly only
>>> after an emergency reboot. Alas the problem is so intermittent that it
>>> is hard to tell what conditions will trigger it.
>>
>> I applied the debug patch below, turn on prink timing and set
>> initdefault to 6 so the machine was in a continual soft reboot cycle.
>> After multiple cycles I got this trace. piix_sata_prereset() reads a
>> zero config byte for almost 15 seconds then it changes to 0x11,
>> followed by a hang. Why is the config byte initially zero, and what
>> makes it change? The normal value for pcs is 0x33.
>
>Can you try 2.6.18-rc2-git3?
>
> Jeff

Running now, with the trivial bug fix below plus my debug patch. I
will leave it running overnight, this problem is very intermittent.

Trivial bug fix:

---
drivers/scsi/ata_piix.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux/drivers/scsi/ata_piix.c
===================================================================
--- linux.orig/drivers/scsi/ata_piix.c
+++ linux/drivers/scsi/ata_piix.c
@@ -567,8 +567,8 @@ static int piix_sata_prereset(struct ata
present = 1;
}

- DPRINTK("ata%u: LEAVE, pcs=0x%x present_mask=0x%x\n",
- ap->id, pcs, present_mask);
+ DPRINTK("ata%u: LEAVE, pcs=0x%x present=0x%x\n",
+ ap->id, pcs, present);

if (!present) {
ata_port_printk(ap, KERN_INFO, "SATA port has no device.\n");

Debug patch:

---
drivers/scsi/ata_piix.c | 5 +++++
include/linux/libata.h | 4 ++++
2 files changed, 9 insertions(+)

Index: linux/drivers/scsi/ata_piix.c
===================================================================
--- linux.orig/drivers/scsi/ata_piix.c
+++ linux/drivers/scsi/ata_piix.c
@@ -529,6 +529,7 @@ static void piix_pata_error_handler(stru
ata_bmdma_drive_eh(ap, piix_pata_prereset, ata_std_softreset, NULL,
ata_std_postreset);
}
+int ata_debug = 1;

/**
* piix_sata_prereset - prereset for SATA host controller
@@ -555,6 +556,7 @@ static int piix_sata_prereset(struct ata
int port, i;
u16 pcs;

+repeat:
pci_read_config_word(pdev, ICH5_PCS, &pcs);
DPRINTK("ata%u: ENTER, pcs=0x%x base=%d\n", ap->id, pcs, base);

@@ -569,6 +571,9 @@ static int piix_sata_prereset(struct ata

DPRINTK("ata%u: LEAVE, pcs=0x%x present=0x%x\n",
ap->id, pcs, present);
+ if (pcs == 0)
+ goto repeat;
+ ata_debug = 0;

if (!present) {
ata_port_printk(ap, KERN_INFO, "SATA port has no device.\n");
Index: linux/include/linux/libata.h
===================================================================
--- linux.orig/include/linux/libata.h
+++ linux/include/linux/libata.h
@@ -61,6 +61,10 @@
#define VPRINTK(fmt, args...)
#endif /* ATA_DEBUG */

+extern int ata_debug;
+#undef DPRINTK
+#define DPRINTK(fmt, args...) if (ata_debug) printk(KERN_ERR "%s: " fmt, __FUNCTION__, ## args)
+
#define BPRINTK(fmt, args...) if (ap->flags & ATA_FLAG_DEBUGMSG) printk(KERN_ERR "%s: " fmt, __FUNCTION__, ## args)

/* NEW: debug levels */

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/