Re: Linux does not care for data integrity

From: Bill Davidsen
Date: Wed Jun 01 2005 - 13:58:59 EST


Matthias Andree wrote:
On Sun, 29 May 2005, Greg Stark wrote:


Oracle, Sybase, Postgres, other databases have hard requirements. They
guarantee that when they acknowledge a transaction commit the data has been
written to non-volatile media and will be recoverable even in the face of a
routine power loss.

They meet this requirement just fine on SCSI drives (where write caching
generally ships disabled) and on any OS where fsync issues a cache flush. If


I don't know what facts "generally ships disabled" is based on, all of
the more recent SCSI drives (non SCA type though) I acquired came with
write cache enabled and some also with queue algorithm modifier set to 1.


Worse, if the disk flushes the data to disk out of order it's quite
likely the entire database will be corrupted on any simple power
outage. I'm not clear whether that's the case for any common drives.


It's a matter of enforcing write order. In how far such ordering
constraints are propagated by file systems, VFS layer, down to the
hardware, is the grand question.

The problem is that in many options required to make that happen in the o/s, hardware, and application are going to kill performance. And even if you can control order of write, unless you can get write to final non-volatile media control you can get a sane database but still lose transactions.

If there was a way for the o/s to know when a physical write was done other than using flushes to force completion, then overall performance could be higher, but individual transaction might have greater latency. And the app could use fsync to force order of write as needed. In many cases groups of writes can be done in any order as long as they are all done before the next logical step takes place.

This would change the meaning of fsync from "force out the data" to "wait for the data to be written" in some implementations.

--
bill davidsen <davidsen@xxxxxxx>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/