Re: [GIT PULL] gfs2 fix

From: Linus Torvalds
Date: Wed Apr 27 2022 - 13:13:35 EST


On Wed, Apr 27, 2022 at 5:29 AM Andreas Gruenbacher <agruenba@xxxxxxxxxx> wrote:
>
> Regular (buffered) reads and writes are expected to be atomic with
> respect to each other.

Linux has actually never honored that completely broken POSIX
requirement, although I think some filesystems (notably XFS) have
tried.

It's a completely broken concept. It's not possible to honor atomicity
with mmap(), and nobody has *ever* cared.

And it causes huge amounts of problems and basically makes any sane
locking entirely impossible.

The fact that you literally broke regular file writes in ways that are
incompatible with (much MUCH more important) POSIX file behavior to
try to get that broken read/write atomicity is only one example among
many for why that alleged rule just has to be ignored.

We do honor the PIPE_BUF atomicity on pipes, which is a completely
different kind of atomicity wrt read/write, and doesn't have the
fundamental issues that arbitrary regular file reads/writes have.

There is absolutely no sane way to do that file atomicity wrt
arbitrary read/write calls (*), and you shouldn't even try.

That rule needs to be forgotten about, and buried 6ft deep.

So please scrub any mention of that idiotic rule from documentation,
and from your brain.

And please don't break "partial write means disk full or IO error" due
to trying to follow this broken rule, which was apparently what you
did.

Because that "regular file read/write is done in full" is a *MUCH*
more important rule, and there is a shitton of applications that most
definitely depend on *that* rule.

Just go to debian code search, and look for

"if (write("

and you'll get thousands of hits, and on the first page of hits 9 out
of 10 of the hits are literally about that "partial write is an
error", eg code like this:

if (write(fd,&triple,sizeof(triple)) != sizeof(triple))
reporterr(1,NULL);

from libreoffice.

Linus

(*) Yeah, if you never care about performance(**) of mixed read/write,
and you don't care about mmap, and you have no other locking issues,
it's certainly possible. The old rule came about from original UNIX
literally taking an inode lock around the whole IO access, because
that was simple, and back in the days you'd never have multiple
concurrent readers/writers anyway.

(**) It's also instructive how O_DIRECT literally throws that rule
away, and then some direct-IO people said for years that direct-IO is
superior and used this as one of their arguments. Probably the same
people who thought that "oh, don't report partial success", because we
can't deal with it.