Re: Should raw I/O be added to the kernel?

Zygo Blaxell (uixjjji1@umail.furryterror.org)
21 Jan 1999 23:38:15 -0500


In article <199901211735.RAA04836@dax.scot.redhat.com>,
Stephen C. Tweedie <sct@redhat.com> wrote:
>On Mon, 21 Dec 1998 22:33:06 +0100 (MET), Gerard Roudier
><groudier@club-internet.fr> said:
>> If you want performances, then you also want to queue several IOs to the
>> device at a time, each time it is possible. Imagine now, that you got an
>> IO error on some IO currently being processed by the device. ...
>
>>> Exactly right. Without raw disk IO, you couldn't guarantee a database
>>> to always be in a consistent state on the disk (i.e. it isn't very
>
>> I reply you that even with raw disk IO, it is not that easy to _really_
>> guarantee such a consistency without being _really_ aware of what may
>> _really_ happen with _real_ IOs, and that _real_ IO system services
>> are generally too poor to allow full control on what _really_ happens
>> with IOs.
>
>Yes you can. The way these applications work is to write all of the

No you can't. Suppose you send a bunch of raw writes to a SCSI disk drive.
OK, so the SCSI disk drive queues them in its embedded cache RAM and tells
the host CPU to send more data. Then the power fails before the SCSI drive
can flush its embedded cache.

Oops.

The software (kernel or application or system configuration/install
program) has to know how to tell the drive that it is not to use its
write cache *at all*, or if it is to use it, to use it to perform writes
in *strict* compliance with the order given, and to *not* acknowledge
anything until all writes have been completed. And of course those
settings had better not vanish if the SCSI bus is reset due to say heat
problems.

There's the possibility of external RAID devices that will undo all that
work for you by doing buffering and ACK's by themselves, then turning
around to talk to disks with data in cache.

Some of the more exotic storage types will actually put several disk
blocks in jeopardy when rewriting only one or two, in the name of
enhanced error correction capability or some such thing. If the kernel
orders a 512-byte sector to be updated to the hardware, will the drive
overwrite only that sector, or will it rewrite the entire track with
new Reed-Solomon codes or some such thing? In other words, if the power
fails, is one sector trashed or an entire cylinder?

And of course we assume here that the disk won't lose its marbles if
hardware failure occurs during write; what if a power problem starts a head
moving across the disk platter with the write head engaged?

Oops.

I won't mention hard disk firmware bugs. <shiver> Let's not go there.

Assuming you can herd these cats, then everything you say is true.
But that's a big assumption and it's not something you can say without
carefully reading the labels on all the boxes. Raw disk I/O is probably
necessary but definitely not sufficient.

-- 
Zygo Blaxell                         (with a name like that, who needs a nick?)
Linux Engineer                          (my favorite official job title so far)
Corel Corporation      (whose opinions sometimes differ from those shown above)
zygob@corel.ca                                  (also zblaxell@furryterror.org)

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/