Re: XFS shutting down due to IO timeout on SATA disk (pata_via forCX700)

From: Mark Lord
Date: Mon Sep 15 2008 - 16:30:55 EST


Tejun Heo wrote:
Bruno PrÃmont wrote:
Since some time one of my systems "freezes" after limited uptime (a
few hours), usually during package compilation process. This seems
to happen only with recent kernel versions (2.6.27-rc*), don't
remember if it also happened with 2.6.26 (though I'm pretty sure it
did not happen with early 2.6.2x series) Unfortunately this always
shutdowns the root filesystem rendering system unusable.

The kernel output below was generated by 2.6.27-rc5-git9, same
symptoms happened with other -rc releases of 2.6.27 though I
couldn't look at dmesg because it happens to / and I only enabled
networked syslog pretty recently on that box in order to find out
what happens.
...
Kernel error output related to XFS shutdown:
[ 9352.420180] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 9352.420247] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[ 9352.420261] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
..

Bruno, please also post the output from these commands:

hdparm --Istdout /dev/sda
smartctl -data -a /dev/sda

Thanks.

Timeout on FLUSH_EXT. That's a bad sign. Patch to retry FLUSH is
pending but at any rate FLUSH failure is often accompanied by loss of
data and XFS is doing the right thing of giving up on it.
..

Tejun, are we *sure* that's really a timeout?
The status shows 0x40 "drive ready" there, aka. "command complete".

I have a client who is also seeing this exact scenario on 750GB drives,
using a patched SLES10 kernel (2.6.16 + libata from 2.6.18 or so).

Smartctl output is clean (no logged errors), and the drives themselves
are fine after a reboot -- necessary since libata/scsi kicked the drive out
of the RAID array.

Something strange is going on here.

????
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/