Re: SATA problems and fs corruption on recent kernels

From: Robert Hancock
Date: Mon Aug 11 2008 - 20:07:55 EST


Fabio Coatti wrote:
Hi all,
I'm facing a quite annoying problem with sata disks. Googling a bit I've seen several references to similar issues, but without any hint on how to solve.
Short description, details below and on request ;) : on a quite old Pentium IV /IC7G abit mobo, I've started to see sata lockups when moving files of 4~15Mb size. I do this quite often (photo, actually) and prior the 2.6.25.something I can't recall any single problem. On that machine I've 3 sata disks, both maxtor and seagate. The lockup caused XFS corruption, and a simple reset is not enough: I've to turn off the power to have the hd drive responding again, otherwise the machine will stop at POST.
It doesn't matter which HD are involved in file transfer, it can happen moving files on different partition of the same disk, between different disks and between sata and usb disks as well.
the same configuration worked without a glitch for years, using drivers sata_sil and ata_piix (that mobo has two controllers)

Since then, I've changed hardware: new mobo (M3N-HT asus), new processor, kernel and even some disks (I've added a new one). Of course new cables and power supply. So I think that a hw culprit can be excluded.
The driver has changed as well, now I use ahci mode for sata disks. Tried with 2.6.26.2
The behaviour is exactly the same: moving files (more or less of the same size as before) causes a HD lockup so bad that it needs a power cycle to recover, otherwise the post will fail ahci detection of the drive (for those used to that controller, it waits for some seconds with "Port:00" message, then the POST process locks)
now even a mount of the damaged xfs partition can trigger the freeze: I can only see a that xfs starts the recovery, then the hd stops blinking (always on) and after that even a "ls" on the drive remains stuck. This happens on a brand new 500Mb sata disk.
so it seems that nor the hardware, nor the 64 or 32 bit of cpu/kernel nor the low level drivers can explain this. I've tried only with xfs, but sounds strange that a fs can lockup a drive.
the hardware that I'm using is a 9850AMD phenom, m3n-ht mobo, 2.6.26.2 kernel, gentoo 2008.0, sata hd from seagate and maxtor, different sizes and models. AHCI sata drivers.
working on small size files seems to be fine, as I can compile kernels and I've installed the system without problems.
Now I will try several things to get more clues, I can donwngrade kernels to see if the situation changes (dunno if the new mobo is compatible with too old kernels...), but if someone can give me some hints about which tests has to be made and wich information I must provide, it will be most welcome
Thanks for any help.

For things to lock up badly enough that even BIOS POST fails to detect the drives or locks up really seems like a hardware problem to me. You're still using some of the same disks from the old machine?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/