Re: SATA exceptions with 2.6.20-rc5

From: Björn Steinbrink
Date: Mon Jan 15 2007 - 16:17:53 EST


On 2007.01.14 17:43:53 -0600, Robert Hancock wrote:
> Björn Steinbrink wrote:
> >Hi,
> >
> >with 2.6.20-rc{2,4,5} (no other tested yet) I see SATA exceptions quite
> >often, with 2.6.19 there are no such exceptions. dmesg and lspci -v
> >output follows. In the meantime, I'll start bisecting.
>
> ...
>
> >ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
> >ata1.00: cmd e7/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 0x0 data 0 in
> > res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
> >ata1: soft resetting port
> >ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> >ata1.00: configured for UDMA/133
> >ata1: EH complete
> >SCSI device sda: 160086528 512-byte hdwr sectors (81964 MB)
> >sda: Write Protect is off
> >sda: Mode Sense: 00 3a 00 00
> >SCSI device sda: write cache: enabled, read cache: enabled, doesn't
> >support DPO or FUA
>
> Looks like all of these errors are from a FLUSH CACHE command and the
> drive is indicating that it is no longer busy, so presumably done.
> That's not a DMA-mapped command, so it wouldn't go through the ADMA
> machinery and I wouldn't have expected this to be handled any
> differently from before. Curious..

My latest bisection attempt actually led to your sata_nv ADMA commit. [1]
I've now backed out that patch from 2.6.20-rc5 and have my stress test
running for 20 minutes now ("record" for a bad kernel surviving that
test is about 40 minutes IIRC). I'll keep it running for at least 2 more
hours.

The test is pretty simple:
while /bin/true; do ls -lR > /dev/null; done
while /bin/true; do echo 255 > /proc/sys/vm/drop_caches; sleep 1; done

running in parallel.

Björn

[1] 2dec7555e6bf2772749113ea0ad454fcdb8cf861
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/