Re: POSIX violation by writeback error

From: Martin Steigerwald
Date: Wed Sep 05 2018 - 03:37:30 EST


Jeff Layton - 04.09.18, 17:44:
> > - If the following read() could be served by a page in memory, just
> > returns the data. If the following read() could not be served by a
> > page in memory and the inode/address_space has a writeback error
> > mark, returns EIO. If there is a writeback error on the file, and
> > the request data could not be served
> > by a page in memory, it means we are reading a (partically)
> > corrupted
> > (out-of-data)
> > file. Receiving an EIO is expected.
>
> No, an error on read is not expected there. Consider this:
>
> Suppose the backend filesystem (maybe an NFSv3 export) is really r/o,
> but was mounted r/w. An application queues up a bunch of writes that
> of course can't be written back (they get EROFS or something when
> they're flushed back to the server), but that application never calls
> fsync.
>
> A completely unrelated application is running as a user that can open
> the file for read, but not r/w. It then goes to open and read the file
> and then gets EIO back or maybe even EROFS.
>
> Why should that application (which did zero writes) have any reason to
> think that the error was due to prior writeback failure by a
> completely separate process? Does EROFS make sense when you're
> attempting to do a read anyway?
>
> Moreover, what is that application's remedy in this case? It just
> wants to read the file, but may not be able to even open it for write
> to issue an fsync to "clear" the error. How do we get things moving
> again so it can do what it wants?
>
> I think your suggestion would open the floodgates for local DoS
> attacks.

I wonder whether a new error for reporting writeback errors like this
could help out of the situation. But from all I read here so far, this
is a really challenging situation to deal with.

I still remember how AmigaOS dealt with this case and from an usability
point of view it was close to ideal: If a disk was removed, like a
floppy disk, a network disk provided by Envoy or even a hard disk, it
pops up a dialog "You MUST insert volume <name of volume> again". And if
you did, it continued writing. That worked even with networked devices.
I tested it. I unplugged the ethernet cable and replugged it and it
continued writing.

I can imagine that this would be quite challenging to implement within
Linux. I remember there has been a Google Summer of Code project for
NetBSD at least been offered to implement this, but I never got to know
whether it was taken or even implemented. If so it might serve as an
inspiration. Anyway AmigaOS did this even for stationary hard disks. I
had the issue of a flaky connection through IDE to SCSI and then SCSI to
UWSCSI adapter. And when the hard disk had connection issues that dialog
popped up, with the name of the operating system volume for example.

Every access to it was blocked then. It simply blocked all processes
that accessed it till it became available again (usually I rebooted in
case of stationary device cause I had to open case or no hot plug
available or working).

But AFAIR AmigaOS also did not have a notion of caching writes for
longer than maybe a few seconds or so and I think just within the device
driver. Writes were (almost) immediate. There have been some
asynchronous I/O libraries and I would expect an delay in the dialog
popping up in that case.

It would be challenging to implement for Linux even just for removable
devices. You have page dirtying and delayed writeback â which is still
an performance issue with NFS of 1 GBit, rsync from local storage that
is faster than 1 GBit and huge files, reducing dirty memory ratio may
help to halve the time needed to complete the rsync copy operation. And
you would need to communicate all the way to userspace to let the user
know about the issue.

Still, at least for removable media, this would be almost the most
usability friendly approach. With robust filesystems (Amiga Old
Filesystem and Fast Filesystem was not robust in case of sudden write
interruption, so the "MUST" was mean that way) one may even offer
"Please insert device <name of device> again to write out unwritten data
or choose to discard that data" in a dialog. And for removable media it
may even work as blocking processes that access it usually would not
block the whole system. But for the operating system disk? I know how
Plasma desktop behaves during massive I/O operations. It usually just
completely stalls to a halt. It seems to me that its processes do some
I/O almost all of the time â or that the Linux kernel blocks other
syscalls too during heavy I/O load.

I just liked to mention it as another crazy idea. But I bet it would
practically need to rewrite the I/O subsystem in Linux to a great
extent, probably diminishing its performance in situations of write
pressure. Or maybe a genius finds a way to implement both. :)

What I do think tough is that the dirty page caching of Linux with its
current standard settings is excessive. 5% / 10% of available memory
often is a lot these days. There has been a discussion reducing the
default, but AFAIK it was never done. Linus suggested in that discussion
to about what the storage can write out in 3 to 5 seconds. That may even
help with error reporting as reducing dirty memory ratio will reduce the
memory pressure and so you may choose to add some memory allocations for
error handling. And the time till you know its not working may be less.

Thanks,
--
Martin