SB600 AHCI: Hard Disk Corruption

From: Patrick
Date: Sun May 25 2008 - 08:27:38 EST


Hello (Tejun Heo *)

I've got an annoying problem with my athlon 64bit, 4gb ram, ïasus m2a-vm
(->SB600 AHCI controller), SAMSUNG HD501LJ SATA Disk. I'm using kernel
2.6.26-rc3. Everything works fine, expect for standby/suspend/hibernate.
Standby freezes, hibernate, I acually ïhaven't tested lately cause I
want suspend to ram to work first.

"echo mem > /sys/power/state; vbetool post;" (on text console)
successfully suspends the system and it resumes as well, BUT: After
resuming, things quickly turn bad: "file not fonund", kernel reports
ext2 errors on root (lvm) partition. After a (hard) reboot the root
fileystem won't even be recognized again by mount and e2fschk can harldy
recover it (thousands of inodes go to lost+found, have to restore
backups to make the system work again). This happend even when the
partition was mounted _readonly_ and it happens to ALL partitions
mounted during suspend. ** I'm testing now by appending break=init to
the kernel command line, getting to a busybox on the initramfs, and then
unmounting "root" before suspending. From there i can dmesg to see
what's happening (though the dmesg buffer is quiet small...can i
increase that in proc somewhere?). I'd be willing to test and send
whatever logs you need to get this fixed.

Some additional infos: Upgrading from 2.6.24, I hoped the
AHCI_HFLAG_NO_MSI in drivers/ata/ahci.c might solve the issue - no luck.
All the other sb600 workarounds: obviousley no luck as well.
irqpoll: slightly different behaviour when unloading sd_mod and ahci
modules before suspending:
without irqpoll, the disk ([sda]) doesn't show up again after "modprobe
ahci; modprobe sd_mod" and I get ï"ata5.00: failed to IDENTIFY [...]
err_mask=0x80" "failed to restore some devices [...]" errors
with irqpoll, disk shows up again and no errors, but "there is different
data" on each read (head -c10000) from /dev/sda. Though the disk is not
changed, after rebooting it contains the original data. I just wonder
how the data is "created" - it seems to be disk content from different
locations (not beginning) on the disk - if i "dd if=/dev/sda
of=/dev/null", i hear the disk reading data....

Well - I hope you might be able to make some sense of that and tell me
what logs and dumps exactly you need to fix it...

Greets - Patrick



* I read many threads in which Tejun provided patches for the SB600 AHCI
Controller which seems to be seriously broken - if only i knew that in
advance... Maybe he can fix this issue as well - last ressort. Otherwise
I'll burn that mobo!

ï** After my firs install and configuring the system for a day, trying
out suspend to ram smashed it with no backups, since then i didn't learn
my lesson and smashed it again 2-3 times, this time with backups at hand
though, ...



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/