Re: SATA exceptions with 2.6.20-rc5

From: Björn Steinbrink
Date: Mon Jan 22 2007 - 21:45:39 EST


On 2007.01.22 19:24:22 -0600, Robert Hancock wrote:
> Björn Steinbrink wrote:
> >>>Running a kernel with the return statement replace by a line that prints
> >>>the irq_stat instead.
> >>>
> >>>Currently I'm seeing lots of 0x10 on ata1 and 0x0 on ata2.
> >>40 minutes stress test now and no exception yet. What's interesting is
> >>that ata1 saw exactly one interrupt with irq_stat 0x0, all others that
> >>might have get dropped are as above.
> >>I'll keep it running for some time and will then re-enable the return
> >>statement to see if there's a relation between the irq_stat 0x0 and the
> >>exception.
> >
> >No, doesn't seem to be related, did get 2 exceptions, but no irq_stat
> >0x0 for ata1. Syslog/dmesg has nothing new either, still the same
> >pattern of dismissed irq_stats.
>
> I've finally managed to reproduce this problem on my box, by doing:
>
> watch --interval=0.1 /sbin/hdparm -I /dev/sda
>
> on one drive and then running bonnie++ on /dev/sdb connected to the
> other port on the same controller device. Usually within a few minutes
> one of the IDENTIFY commands would time out in the same way you guys
> have been seeing.
>
> Through some various trials and tribulations, the only conclusion I can
> come to is that this controller really doesn't like that
> NV_INT_STATUS_CK804 register being looked at in ADMA mode. I tried
> adding some debug code to the qc_issue function that would check to see
> if the BUSY flag in altstatus went high or that register showed an
> interrupt within a certain time afterwards, however that really seemed
> to hose things, the system wouldn't even boot.

Hm, I don't think it is unhappy about looking at NV_INT_STATUS_CK804.
I'm running 2.6.20-rc5 with the INT_DEV check removed for 8 hours now
without a single problem and that should still look at
NV_INT_STATUS_CK804, right?
I just noticed that my last email might not have been clear enough. The
exceptions happened when I re-enabled the return statement in addition
to the debug message. Without the INT_DEV check, it is completely fine
AFAICT.

> Try out this patch, it just calls the ata_host_intr function where
> appropriate without using nv_host_intr which looks at the
> NV_INT_STATUS_CK804 register. This is what the original ADMA patch from
> Mr. Mysterious NVIDIA Person did, I'm guessing there may be a reason for
> that. With this patch I can get through a whole bonnie++ run with the
> repeated IDENTIFY requests running without seeing the error.

I'll see if I can schedule a test run for tomorrow, I currently need
this box.

Thanks,
Björn
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/