On Sat, 11 Feb 2006, Nick Piggin wrote:
Your pattern would actually be
.. dirty offset 100-200 ..
fadvice(fd, 100, 200, FADV_WRITE_START);
.. dirty offset 200-300 ..
fadvice(fd, 200, 300, FADV_WRITE_START);
.. dirty offset 300-400 ..
fadvice(fd, 300, 400, FADV_WRITE_START);
.. dirty offset 400-415 .. (for the next transaction)
- IOW if the app or OS crashed here it would be possible to see 400-415 on
the disk and none of the previous transactions (assuming we don't know
the page size).
If the app/OS crashed here, nothing would matter. We haven't committed anything at all yet. We've just started the IO. What is at 400-415 simply doesn't matter, because nobody would have any reason to look at it.
(Besides, it's not at all clear that 400-415 would or would not be on disk. It depends on entirely on timing and buffering of the IO system at that point - the fact that its dirty in memory doesn't mean that it ever made it into the IO buffer that was started).
fadvice(fd, 100, 400, FADV_JUST_WAIT); (for the previous one)
This is the one that waits for it to finish, so _now_ we can update the pointers (elsewhere) to that log (and if the app/OS crashes before that, nobody will even know about it).
See?
I'm not convinced. You above example was bogus.
No, your understanding was incomplete. I'm talking about just parts of a much bigger transaction.
A single write on its own is almost never a transaction unless your system is _purely_ log-based (which it could be, of course. Not in my example).