Many thanks for above responses.
Sounds like Ext3 uses journal to protect the data integrity. In data
journal mode, ext 3 writes data to journal first, marks start to
commit, then marks "done with commit" after data is flushed to disk. If power failure happen during data flush, the system will redo the
data writes next time system is back.
However, how to guarantee the integrity of journal? This solution
works based on an assumption that the journal data has been flushed to
disk before file data is flushed. Otherwise, consider this scenario:
process A wrote a data block to File F. Ext3 first writes this data
block into journal, put a "start to commit" notice, flags that journal
page as dirty. (note that the journal data is not flushed into disk
yet). Then ext3 starts to flag data page as dirty and wait for flush
daemon to write it to disk. Say just when the disk controller writes
2048 out of 4096 bytes into disk, power outage happens. At this time,
journal data has not been flushed into disk, so no enough information
to support redo. The file A will end up with some junk data. So
flushing journal data to disk before starting to write file data to
disk seems to be necessary. If so, how ext3 guarantees that? Is it
because the dirty pages are flushed in a first come first serve
fashion?
Another concern is that the journal data mode requires twice as much
as data to write, this could impact performance if disk bandwidth
usage is over 50%. For small files, it could be rare to use 50%. But
how about large files? In a real world system, what's the probablity
of using over 50% disk bandwidth?
I am sorry for ask for too high integrity on data. But I think ext3 is
a famous stable file system, it should have some good design to
protect data integrity.
Last question, does anyone know whether it is possible that ext3
creates some junk data or makes bitmap and inode inconsistent (under
any extreme condition) ?
Again, thanks for your help!
Xin