XFS shutting down due to IO timeout on SATA disk (pata_via forCX700)

From: Bruno PrÃmont
Date: Thu Sep 11 2008 - 13:45:47 EST


Since some time one of my systems "freezes" after limited uptime (a few hours), usually
during package compilation process.
This seems to happen only with recent kernel versions (2.6.27-rc*), don't remember if
it also happened with 2.6.26 (though I'm pretty sure it did not happen with early
2.6.2x series)

Unfortunately this always shutdowns the root filesystem rendering system unusable.

The kernel output below was generated by 2.6.27-rc5-git9, same symptoms happened with
other -rc releases of 2.6.27 though I couldn't look at dmesg because it happens to /
and I only enabled networked syslog pretty recently on that box in order to find out
what happens.

Unfortunately either the chipset or the BIOS do not support AHCI, for the SATA
controller as the only choice for SATA offered by BIOS is: IDE.


Is this a known issue? At least there seem to be similar ATA exceptions happening
lately according to search results returned by google when looking for the error
messages (exception and originating command).


-- improvement suggestion --
To keep the system running it would be nice if the failing command could be re-issued
after resetting the link and rediscovering the drive, that is, pushing the error to
upper layers only after new failure when retrying the operation following the reset.
-- end of suggestion --

If kernel config or complete output of dmesg is of some help, please let me know.

In case there are some tuning options to try in order to pinpoint the cause I can try
them out, that system is not in production use. (according to some of the messages I
found it could be related to drive cache flushing)

Bruno


lspci output:
00:00.0 Host bridge [0600]: VIA Technologies, Inc. CX700 Host Bridge [1106:0324] (rev 03)
00:00.1 Host bridge [0600]: VIA Technologies, Inc. CX700 Host Bridge [1106:1324]
00:00.2 Host bridge [0600]: VIA Technologies, Inc. CX700 Host Bridge [1106:2324]
00:00.3 Host bridge [0600]: VIA Technologies, Inc. CX700 Host Bridge [1106:3324]
00:00.4 Host bridge [0600]: VIA Technologies, Inc. CX700 Host Bridge [1106:4324]
00:00.7 Host bridge [0600]: VIA Technologies, Inc. CX700 Host Bridge [1106:7324]
00:01.0 PCI bridge [0604]: VIA Technologies, Inc. VT8237 PCI Bridge [1106:b198]
00:0f.0 IDE interface [0101]: VIA Technologies, Inc. Device [1106:0581]
00:10.0 USB Controller [0c03]: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller [1106:3038] (rev 90)
00:10.1 USB Controller [0c03]: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller [1106:3038] (rev 90)
00:10.2 USB Controller [0c03]: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller [1106:3038] (rev 90)
00:10.4 USB Controller [0c03]: VIA Technologies, Inc. USB 2.0 [1106:3104] (rev 90)
00:11.0 ISA bridge [0601]: VIA Technologies, Inc. CX700 PCI to ISA Bridge [1106:8324]
00:11.7 Host bridge [0600]: VIA Technologies, Inc. CX700 Internal Module Bus [1106:324e]
00:13.0 Host bridge [0600]: VIA Technologies, Inc. CX700 Host Bridge [1106:324b]
00:13.1 PCI bridge [0604]: VIA Technologies, Inc. CX700 PCI to PCI Bridge [1106:324a]
01:00.0 VGA compatible controller [0300]: VIA Technologies, Inc. CX700M2 UniChrome PRO II Graphics [1106:3157] (rev 03)
02:08.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet [10ec:8169] (rev 10)
80:01.0 Audio device [0403]: VIA Technologies, Inc. VIA High Definition Audio Controller [1106:3288] (rev 10)

Hard-drive details as reported by hdparm -I:
/dev/sda:

ATA device, with non-removable media
Model Number: FUJITSU MHY2250BH
Serial Number: K407T7A25THF
Firmware Revision: 0000000B
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5; Revision: ATA8-AST T13 Project D1697
Revision 0b
Standards:
Used: ATA-8-ACS revision 3f
Supported: 8 7 6 5
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 488397168
device size with M = 1024*1024: 238475 MBytes
device size with M = 1000*1000: 250059 MBytes (250 GB)
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: 128
Recommended acoustic management value: 254, current value: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* DOWNLOAD_MICROCODE
* Advanced Power Management feature set
SET_MAX security extension
* Automatic Acoustic Management feature set
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* 64-bit World wide name
* IDLE_IMMEDIATE with UNLOAD
Disable Data Transfer After Error Detection
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* SATA-I signaling speed (1.5Gb/s)
* Native Command Queueing (NCQ)
* Host-initiated interface power management
* Phy event counters
DMA Setup Auto-Activate optimization
Device-initiated interface power management
* Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT LBA Segment Access (AC2)
* SCT Error Recovery Control (AC3)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
Security:
Master password revision code = 65534
supported
not enabled
not locked
not frozen
not expired: security count
not supported: enhanced erase
250min for SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 5000e040f1a7bd
NAA : 5
IEEE OUI : e
Unique ID : 040f1a7bd
Checksum: correct

Kernel messages related to driver initialization:
[ 2.568109] pata_via 0000:00:0f.0: version 0.3.3
[ 2.568313] scsi0 : pata_via
[ 2.568748] scsi1 : pata_via
[ 2.573314] ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xff00 irq 14
[ 2.573418] ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xff08 irq 15
[ 2.760280] ata1.00: ATA-8: FUJITSU MHY2250BH, 0000000B, max UDMA/100
[ 2.760422] ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 0/32)
[ 2.800304] ata1.00: configured for UDMA/100
[ 2.971844] scsi 0:0:0:0: Direct-Access ATA FUJITSU MHY2250B 0000 PQ: 0 ANSI: 5
[ 2.972976] sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
[ 2.973192] sd 0:0:0:0: [sda] Write Protect is off
[ 2.973321] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 2.973453] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2.973938] sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
[ 2.974142] sd 0:0:0:0: [sda] Write Protect is off
[ 2.974270] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 2.974399] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2.974588] sda: sda1 sda2 sda3 sda4 sda5 sda6
[ 3.201488] sd 0:0:0:0: [sda] Attached SCSI disk


Kernel error output related to XFS shutdown:
[ 9352.420180] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 9352.420247] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[ 9352.420261] res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 9352.420289] ata1.00: status: { DRDY }
[ 9352.420353] ata1: soft resetting link
[ 9352.650317] ata1.00: configured for UDMA/100
[ 9352.650374] end_request: I/O error, dev sda, sector 6410215
[ 9352.650432] ata1: EH complete
[ 9352.650654] I/O error in filesystem ("sda3") meta-data dev sda3 block 0x203fa6 ("xlog_iodone") error 5 buf count 32768
[ 9352.650824] xfs_force_shutdown(sda3,0x2) called from line 1027 of file /usr/src/linux-2.6.27-rc5-git9/fs/xfs/xfs_log.c. Return address = 0xc020ccba
[ 9352.651304] Filesystem "sda3": Log I/O Error Detected. Shutting down filesystem: sda3
[ 9352.651332] Please umount the filesystem, and rectify the problem(s)
[ 9352.651395] xfs_force_shutdown(sda3,0x2) called from line 790 of file /usr/src/linux-2.6.27-rc5-git9/fs/xfs/xfs_log.c. Return address = 0xc020dfce
[ 9352.654454] sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
[ 9352.659345] xfs_force_shutdown(sda3,0x2) called from line 790 of file /usr/src/linux-2.6.27-rc5-git9/fs/xfs/xfs_log.c. Return address = 0xc020dfce
[ 9352.988239] sd 0:0:0:0: [sda] Write Protect is off
[ 9352.988277] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 9353.026123] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 9383.090107] Filesystem "sda3": xfs_log_force: error 5 returned.
[ 9413.090091] Filesystem "sda3": xfs_log_force: error 5 returned.
[ 9443.090112] Filesystem "sda3": xfs_log_force: error 5 returned.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/