Re: POSIX violation by writeback error

From: Theodore Y. Ts'o
Date: Tue Sep 25 2018 - 11:46:37 EST


On Tue, Sep 25, 2018 at 07:15:34AM -0400, Jeff Layton wrote:
> Linux has dozens of filesystems and they all behave differently in this
> regard. A catastrophic failure (paradoxically) makes things simpler for
> the fs developer, but even on local filesystems isolated errors can
> occur. It's also not just NFS -- what mostly started me down this road
> was working on ENOSPC handling for CephFS.
>
> I think it'd be good to at least establish a "gold standard" for what
> filesystems ought to do in this situation. We might not be able to
> achieve that in all cases, but we could then document the exceptions.

I'd argue the standard should be the precedent set by AFS and NFS.
AFS verifies space available on close(2) and returns ENOSPC from the
close(2) system call if space is not available. At MIT Project
Athena, where we used AFS extensively in the late 80's and early 90's,
we made and contributed back changes to avoid data loss as a result of
quota errors.

The best practice that should be documented for userspace is when
writing precious files[1], programs should open for writing foo.new, write
out the data, call fsync() and check the error return, call close()
and check the error return, and then call rename(foo.new, foo) and
check the error return. Writing a library function which does this,
and which also copies the ACL's and xattr's from foo to foo.new before
the rename() would probably help, but not as much as we might think.

[1] That is, editors writing source files, but not compilers and
similar programs writing object files and other generated files.

None of this is really all that new. We had the same discussion back
during the O_PONIES controversy, and we came out in the same place.

- Ted

P.S. One thought: it might be cool if there was some way for
userspace applications to mark files with "nuke if not closed" flag,
such that if the system crashes, the file systems would automatically
unlink the file after a reboot or if the process was killed or exits
without an explicit close(2). For networked/remote file systems that
supported this flag, after the client comes back up after a reboot, it
could notify the server that all files created previously from that
client should be unlinked.

Unlike O_TMPFILE, this would require file system changes to support,
so maybe it's not worth having something which automatically cleans up
files that were in the middle of being written at the time of a system
crash. (Especially since you can get most of the functionality by
using some naming convention for files that in the process of being
written, and then teach some program that is regularly scanning the
entire file system, such as updatedb(2) to nuke the files from a cron
job. It won't be as efficient, but it would be much easier to
implement.)