Re: [PATCH] ahci.c: fix ati sb600 sata IRQ_TF_ERR

From: Conke Hu
Date: Thu Mar 15 2007 - 08:01:07 EST


On 3/14/07, Tejun Heo <htejun@xxxxxxxxx> wrote:
Hello,

Conke Hu wrote:
> When there is no media in SATA CD/DVD drive or media is not ready,
> AHCI controller fails to execute the ATAPI commands TEST_UNIT_READY,
> READ_CAPACITY or READ_TOC and reports PORT_IRQ_TF_ERR. But ATI SB600
> SATA controller sets SERR_INTERNAL bit in the error register at the
> same time, which is not necessary. This patch is just to ignore the
> INTERNAL ERROR in such case. Without this patch, ahci error handler
> will report many errors as below:

Whoa, SERR_INTERNAL on ATAPI check condition? Just for fun, here's what
the spec says about SERR_INTERNAL.


When media is not ready, command TEST_UNIT_READY fails with ahci irq
status == 0x40000001(IRQ_TF_ERROR) and serror == SERR_INTERNEL, then
ahci error handler calls atapi_eh_request_sense() and sets
ATA_QCFLAG_SENSE_VALID. Command REQUEST_SENSE executes successfully
and atapi_qc_complete() sets result = SAM_STAT_CHECK_CONDITION, and
now the whole TEST_UNIT_READY request is done and returns .


E Internal error: The host bus adapter experienced an internal error
that caused the operation to fail and may have put the host bus adapter
into an error state. Host software should reset the interface before
re-trying the operation. If the condition persists, the host bus adapter
may suffer from a design issue rendering it incompatible with the
attached device.


Yes, I saw this too :) and I am contacting the hardware engineers to
check if there is any hardware bug.
But, even though this were a hardware bug and could be fixed, we would
still need this patch since many SB600 boards have already come into
the market and those ASICs can never be fixed :(
So, if no errors in this patch, could Jeff please apply it ASAP?


Anyways thanks for fixing this. Just a few comments.

> --- linux-2.6.21-rc3-git8/drivers/ata/ahci.c.orig 2007-03-25
> 20:57:31.000000000 +0800
> +++ linux-2.6.21-rc3-git8/drivers/ata/ahci.c 2007-03-25
> 21:03:54.000000000 +0800
> @@ -80,6 +80,7 @@ enum {
> board_ahci_pi = 1,
> board_ahci_vt8251 = 2,
> board_ahci_ign_iferr = 3,
> + board_ahci_ati = 4,
>
> /* global controller registers */
> HOST_CAP = 0x00, /* host capabilities */
> @@ -168,6 +169,7 @@ enum {
> AHCI_FLAG_NO_NCQ = (1 << 24),
> AHCI_FLAG_IGN_IRQ_IF_ERR = (1 << 25), /* ignore IRQ_IF_ERR */
> AHCI_FLAG_HONOR_PI = (1 << 26), /* honor PORTS_IMPL */
> + AHCI_FLAG_TF_ERR_FIX = (1 << 27), /* ignore INTERNAL ERROR on
> IRQ_TF_ERROR */

Can we use board_ahci_ign_interr and AHCI_FLAG_IGN_SERR_INTERNAL to keep
it more consistent with the other IGN flag?

> - { PCI_VDEVICE(ATI, 0x4380), board_ahci }, /* ATI SB600 non-raid */
> + { PCI_VDEVICE(ATI, 0x4380), board_ahci_ati }, /* ATI SB600 non-raid */
> { PCI_VDEVICE(ATI, 0x4381), board_ahci }, /* ATI SB600 raid */

4381 isn't affected while 4380 is?

I never see such an ID, and plan to remove 0x4381.
The patch which added the PCI IDs was not sent out by myself. I
checked all SB600 boards, and not found any 0x4381 controller, only
0x4380 instead. In fact, SB600 RAID and Non-RAID share the same PCI
device ID, only with class code different.



Hmmm... Okay, this is weird. I'm feeling very strong deja vu.

Well, I must be getting alzheimer. I did almost the same patch a month
ago and was waiting for verification to properly submit the patch.

http://thread.gmane.org/gmane.linux.ide/16049/focus=16437

Anyways, Conke Hu, can you please take a look at my patch from a month
ago? It's almost identical but SERR_INTERNAL is always ignored on both
SB600 PCI IDs, which I think is safer. Does this fix what you're seeing?


I just read your patch. Another difference is that my patch ignores
SERR_INTERNAL only when the command is ATAPI and IRQ_TF_ERR occurs. In
other cases, I think, we'd better not ignore the SERR_INTERNEL. Right?
The following is some detail:
// your patch:
+ if (ap->flags & AHCI_FLAG_IGN_SERR_INTERNAL)
+ serr &= ~SERR_INTERNAL;

// mine:
- if (irq_stat & PORT_IRQ_TF_ERR)
+ if (irq_stat & PORT_IRQ_TF_ERR) {
err_mask |= AC_ERR_DEV;
+
+ /* some controllers set INTERNAL ERROR on ATAPI
IRQ_TF_ERROR, ignore it */
+ if ((serror & SERR_INTERNAL) &&
+ (ap->flags & AHCI_FLAG_TF_ERR_FIX) &&
+ qc && qc->dev->class == ATA_DEV_ATAPI) {
+ serror &= ~SERR_INTERNAL;
+ }
+ }

Tejun, you do me a great favor, thank you so much! for your previous
help, too :)

Conke
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/