Re: [Ext2-devel] Re: [RFD] FS behavior (I/O failure) in kernelsummit

From: Kenichi Okuyama
Date: Wed Jun 15 2005 - 14:43:29 EST


Dear Ted-san, and all,

>>>>> "Ted" == Theodore Ts'o <tytso@xxxxxxx> writes:
Ted> Part of the problem is that we are limited by the constraints of the
Ted> POSIX specification for error handling. For example, we don't have a
Ted> way of telling the application, "the reason why you the filesystem was
Ted> remounted-read-only was in reaction to an I/O error that appears to be
Ted> caused by the multiple CRC checksum errors reported by the SCSI
Ted> controller". We can only return EIO or EROFS. And while the write()
Ted> which causes an I/O error that remounts the filesystem read/only can
Ted> (and probably does) return EIO, any subsequent writes will return
Ted> EROFS, and changing this would be hard, hackish, and probably wouldn't
Ted> be accepted.


You said:

Ted> And while the write()
Ted> which causes an I/O error that remounts the filesystem read/only can
Ted> (and probably does) return EIO

No. they return EROFS from beginning.


Ted> We can only return EIO or EROFS.

I do understand about EIO. What I don't see is EROFS.
EROFS could be returned if file system is being mounted as r/o from
beginning.



The point is pretty easy ( I think ).

Q1. Why does file system succeed in re-mounting as r/o, when device
is totally lost?

If device did exist, and throwing away the dirty pages did succeed,
then unmount that device/mount them as read only, should succeed
too. If this is what's happening, EROFS is good result. I agree with
this.

But in case of Mr. Qu's test, device is lost. USB cabel is
unplugged. They are unreachable. How could such device be *MOUNTED*?
# In other word, why can't I mount device which does not exist,
# while I can re-mount them?



Ted> So instead of trying to standardize the existing error returns, which
Ted> are they way they are and for which trying to standardize them would
Ted> probably be not worth the effort, since they don't return enough
Ted> context to the application anyway

I'm sorry, but I can't agree with this.

When error arise from system call, what application first care is to
divide error into two types.

1) devices and file systems are still under control of kernel.
2) devices or file systems are not under control of kernel anymore.

In case of 1, application will wonder if application have done
something wrong ( including user mistakenly mounted filesystem r/o
when it should be r/w, or application writing to r/o file system ).

In case of 2, application can do nothing ( and so should be kernel
). It's human who have to decide what to do.
# usualy, it means "give up the data", but sometimes, you may
# have choice of writing that data to some other devices.

Once type 2 error arise, system should not go back to type 1.
It should be one way path.


EROFS is typically error of type 1.
EIO is typically type 2.
(SIGBUS is typically type2 too. But is there any other type 2
error? None that I know of. )

What I believe (I'm sorry, but this is only my believe) is, we
should not mix error of type 1 with error of type 2. If type 2
problem arised, type 2 error should be passed to application. The
mode should be one-way.

I do agree that, for devices, it is device driver's responsibility
to identify which type of error have arised. But when file system
recieved type 2 error, he should not change it to type 1 error
( unless fs could really guarantee that ).


And, therefore, for type 2, I belive they can be standardize, and I
think we should.

I strongly agree that, for type 1, there are many ways we can
handle. I agree how you treat type 1 error would be characteristics
of each file system. Standardizing type 1 error is (in most cases)
nonsense.


I hope, Mr. Qu is looking for same thing.

regards,
----
Kenichi Okuyama
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/