Re: [PATCH 0/3] mm: Swap checksum

From: Cesar Eduardo Barros
Date: Wed May 26 2010 - 19:19:31 EST


Em 26-05-2010 19:45, Minchan Kim escreveu:
On Thu, May 27, 2010 at 6:28 AM,<Valdis.Kletnieks@xxxxxx> wrote:
On Thu, 27 May 2010 00:31:44 +0900, Minchan Kim said:
On Wed, May 26, 2010 at 07:21:57AM -0300, Cesar Eduardo Barros wrote:
far as I can see, does nothing against the disk simply failing to
write and later returning stale data, since the stale checksum would
match the stale data.

Sorry. I can't understand your point.
Who makes stale data? If any layer makes data as stale, integrity is up to
the layer. Maybe I am missing your point.
Could you explain more detail?

I'm pretty sure that what Cesar meant was that the following could happen:

1) Write block 11983 on the disk, checksum 34FE9B72.
(... time passes.. maybe weeks)
2) Attempt to write block 11983 on disk with checksum AE9F3581. The write fails
due to a power failure or something.
(... more time passes...)
3) Read block 11983, get back data with checksum 34FE9B72. Checksum matches,
and there's no indication that the write in (2) ever failed. The program
proceeds thinking it's just read back the most recently written data, when in
fact it's just read an older version of that block. Problems can ensue if the
data just read is now out of sync with *other* blocks of data - instant data
corruption.

Oh, doesn't normal disk support atomicity of sector write?
I have been thought disk must support atomicity of sector write at least.

It is called a "high fly write" (a write where the disk head was flying too high and the data did not get written at all). There are other causes than high fly writes for this, of course, but the symptom is the same: whatever you were trying to write was not written at all, and the old contents are still there.

The write is still atomic: it either did happen completely or did not happen at all (in this case, it did not happen at all). You *can* have a partial write (with a well-timed power loss, for instance), but the disk's own ECC will detect this as a corrupted sector and return an error on read.

--
Cesar Eduardo Barros
cesarb@xxxxxxxxxx
cesar.barros@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/