Silent data corruption with kernel 3.4 and FireWire disks

From: Stefan Richter
Date: Sat Jun 02 2012 - 05:55:32 EST


About a week ago I noticed silent data corruptions of files on FireWire
disks: Mount disk, read lots of data and e.g. compute their md5sum,
unmount disk, mount disk again, read and md5sum the same files again ->
MD5s may differ.

Defects in files that were written in May hint that not only reading from
but also writing to FireWire disks resulted in corrupt data. This was
silent corruption without any error messages from the PCI, firewire, SCSI,
block, or filesystem subsystems.

Affected:
- kernel 3.4
- kernel 3.4-rc5
Not affected:
- kernel 3.3.1 (which I have been running now for the last 6 days)

I used these three kernels with the same patchlevel of FireWire drivers,
namely circa those which are about to be released in 3.5-rc1. FireWire
disks with different 1394-to-SATA or -IDE bridge chips are affected. I
noticed the problem at first on an Agere FW643e PCIe 1394 controller which
sits behind a PLX PEX 8505 PCIe switch.

MPEG2TS video reception through the same 1394 controller and PCIe switch
did never show a noticable sign of corruption.

I did not have time yet to systematically test
- whether all of my FireWire controllers are affected,
- whether SATA or USB disks are affected (SATA probably not, USB not
used yet),
- whether my secondary Linux PC is affected.

Kernel 3.4 and 3.4-rc5 exhibited another (seemingly harmless but
suspicious) issue on my primary PC: frequent transmit queue time-outs of
an RTL8111/8168B Ethernet interface,
http://www.spinics.net/lists/netdev/msg197032.html

Being busy at work lately and not having Linux available at work, I will
be slow to look further into it. With enough spare time, it should be
possible to identify the regression by bisection between kernel 3.3 and
3.4-rc but I have no estimate when I will be able to spend that time.
--
Stefan Richter
-=====-===-- -==- ---=-
http://arcgraph.de/sr/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/