Re: ext3_ordered_writepage() questions

From: Jamie Lokier
Date: Fri Mar 17 2006 - 17:42:34 EST


Theodore Ts'o wrote:
> If the application cares about the precise ordering of data blocks
> being written out with respect to the mtime field, I'd respectfully
> suggest that the application use data journalling mode --- and note
> that most applications which update existing data blocks, especially
> relational databases, either don't care about mtime, have their own
> data recovering subsystems, or both.

I think if you're thinking this only affects "applications" or
individual programs (like databases), then you didn't think about the
example I gave.

Scenario:

- Person has two computers, A and B.
Maybe a desktop and laptop. Maybe office and home machines.

- Sometimes they do work on A, sometimes they do work on B.
Things like editing pictures or spreadsheets or whatever.

- They use "rsync" to copy their working directory from A to B, or
B to A, when they move between computers.

- They're working on A one day, and there's a power cut.

- Power comes back.

- They decide to start again on A, using "rsync" to copy from B to A
to get a good set of files.

- "rsync" is believed to mirror directories from one place to
another without problems. It's always worked for them before.
(Heck, until this thread came up, I assumed it would always work).

- ext3 is generally trusted, so no fsck or anything else special is
thought to be required after a power cut.

- So after running "rsync", they believe it's safe to work on A.

This assumption is invalid, because of ext3's data vs. mtime
write ordering when they were working on A before the power cut.

But the user doesn't expect this. It's far from obvious (except
to a very thoughtful techie) that rsync, which always works
normally and even tidies up mistakes normally, won't give correct
results this time.

- So they carry on working, with corrupted data. Maybe they won't
notice for a long time, and the corruption stays in their work
project.

No individual program or mount option is at fault in the above
scenario. The combination together creates a fault, but only after a
power cut. The usage is fine in normal use and for all other typical
errors which affect files.

Technically, using data=journal, or --checksum with rsync, would be fine.

But nobody _expects_ to have to do that. It's a surprise.

And they both imply a big performance overhead, so nobody is ever
advised to do that just to be safe for "ordinary" day to day work.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/