Hello All,
I have observed some behavior under certain failure conditions that seems
as if the kernel may be ignoring write errors to disk.
During very heavy read/write io if we force a disk to fail requests
continue to be submitted until the controllers queue is full.
Ultimately, the requests are timed out by the controller. When this
happens we see filesystem corruption. Sometimes it's the file data,
other times it's filesystem metadata that has been timed out and
failed. Either way its obviously undesirable behavior.
It looks like the OS/filesystem (ext2/3 and reiserfs) does not
wait for for a successful completion. Is this assumption correct?


It depends. Obviously if you disconnect your hard drive, the writes
will fail with a time-out. But they fail after a number of retries
(it depends upon the type of disk and its driver). So, if you
"force" a timeout by disconnecting a drive, you don't have
the same situtation as a normally failed write.

Disk/file writes go like this (assuming no sync() or fsync()).

(1) File data gets flushed to a queue.
(2) When the queue gets nearly full, based upon a LRU mechanism,
data are written to the disk.
(3) If the disk-write fails, the driver retries the write.
(4) If the write continues to fail, i.e., timeout, no disk, etc.
the kernel gives up and does not hang forever. If you have
disconnected the drive, you won't have any syslog writes to
the device so your next boot won't show the event. It looks
as though it was ignored.

You can observe the behavior by mounting a floppy disk and
then removing it while it is being written. There are many
attempts to write to the device and then that write is discarded.

