Re: sata_via

From: Rich West
Date: Fri Apr 04 2008 - 20:00:28 EST




Jeff Garzik wrote:
Rich West wrote:
On my mythtv backend system, the recordings volume tends to get pounded
rather hard (up to 5 recordings (some HD) at once with multiple frontend
systems reading from that same volume). I recently (4 months ago)
upgraded the system to a motherboard that happened to have the VIA
chipset on it.

Since that time, I have had some bizarre problems with that volume. After a seemingly random amount of time, the kernel would report an
error with the volume and put it in read-only mode. However, it would
not really be in read-only mode, but it would be completely
inaccessible. Unmounting the volume would be successful, but
re-mounting the volume would fail.

I've replaced the drive (with an identical one), tested memory, changed
filesystems (it was LVM + ext3, then just ext3) and the problem persists.

Running 2.6.24.4-64 (Fedora 8).

A larger snippet from the messages log is (dmesg gets cleared after reboot):
Apr 3 16:47:27 mythtv1 kernel: ata4.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x2 frozen
Apr 3 16:47:27 mythtv1 kernel: ata4.00: cmd
c8/00:00:77:31:21/00:00:00:00:00/e1 tag 0 dma 131072 in
Apr 3 16:47:27 mythtv1 kernel: res
40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 3 16:47:27 mythtv1 kernel: ata4.00: status: { DRDY }
Apr 3 16:47:27 mythtv1 kernel: ata4: soft resetting link
Apr 3 16:47:57 mythtv1 kernel: ata4.00: qc timeout (cmd 0x27)
Apr 3 16:47:57 mythtv1 kernel: ata4.00: failed to read native max
address (err_mask=0x4)
Apr 3 16:47:57 mythtv1 kernel: ata4.00: HPA support seems broken, will
skip HPA handling
Apr 3 16:47:57 mythtv1 kernel: ata4.00: revalidation failed (errno=-5)
Apr 3 16:47:57 mythtv1 kernel: ata4: failed to recover some devices,
retrying in 5 secs
Apr 3 16:48:02 mythtv1 kernel: ata4: soft resetting link
Apr 3 16:48:02 mythtv1 kernel: ata4.00: configured for UDMA/133
Apr 3 16:48:02 mythtv1 kernel: ata4: EH complete
Apr 3 16:49:02 mythtv1 kernel: ata4.00: exception Emask 0x0 SAct 0x0
SErr 0x0 action 0x2 frozen
Apr 3 16:49:02 mythtv1 kernel: ata4.00: cmd
c8/00:00:77:31:21/00:00:00:00:00/e1 tag 0 dma 131072 in
Apr 3 16:49:02 mythtv1 kernel: res
40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Apr 3 16:49:02 mythtv1 kernel: ata4.00: status: { DRDY }
Apr 3 16:49:02 mythtv1 kernel: ata4: soft resetting link
Apr 3 16:49:03 mythtv1 kernel: ata4.00: configured for UDMA/133
Apr 3 16:49:03 mythtv1 kernel: ata4: EH complete

It is almost as if I am hitting some bug that is causing the drive to
fall off, but I really don't know where else to look or where else to
turn...

I'm tempted to just go back to using a PATA drive (smaller. :( ) to
avoid the problem. I'm just at a loss as to how it can actually be solved.

This timeout/DRDY message has been a common one recently. Some of the issues causing this may be resolved in 2.6.25-rc, can you try that?

Also, if you could build and test some older kernels to see when this behavior first appeared, that would be quite helpful.

Overall, a timeout _might_ be a problem with libata (the kernel SATA drivers), or it _might_ be a problem with your system's interrupt delivery (sometimes an ACPI or BIOS problem). Try booting with 'noapic' or 'acpi=off'.


Thanks for the quick response.

I know this problem was happening with all of the Fedora 7 supplied kernels (from initial release up until about a week ago) and has happened with each of the Fedora 8 supplied kernels. I'll try rolling 2.6.25-rc to see if the problem resurfaces. Unfortunately, I don't know what collision of events causes this problem to erupt, but it usually happens within 7 days of a reboot (some times within hours of a reboot).

I'll give noapic a try, but (dumb question) what does acpi=off buy?

-Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/