Re: ext3_ordered_writepage() questions

From: Theodore Ts'o
Date: Fri Mar 17 2006 - 17:08:54 EST


On Fri, Mar 17, 2006 at 04:50:21PM -0500, Stephen C. Tweedie wrote:
>
> It's *only* for updating existing data blocks that there's any
> justification for writing mtime first. That's the question here.
>
> There's a significant cost in forcing the mtime to go first: it means
> that the VM cannot perform any data writeback for data written by a
> transaction until the transaction has first been committed. That's the
> last thing you want to be happening under VM pressure, as you may not in
> fact be able to close the transaction without first allocating more
> memory.

Actually, we're not even able to force the mtime to happen first in
this case. In ordered mode, we still force the data blocks *first*,
and only later do we force the mtime update out. With Badari's
proposed change, we completely decouple when the data blocks get
written out with the mtime update; it could happen before, or after,
at the OS's convenience.

If the application cares about the precise ordering of data blocks
being written out with respect to the mtime field, I'd respectfully
suggest that the application use data journalling mode --- and note
that most applications which update existing data blocks, especially
relational databases, either don't care about mtime, have their own
data recovering subsystems, or both.

- Ted
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/