Re: [PATCH 01/14] VFS: Add additional RESOLVE_* flags [ver #18]

From: Al Viro
Date: Fri Mar 13 2020 - 14:33:52 EST


On Fri, Mar 13, 2020 at 08:59:01PM +1100, Aleksa Sarai wrote:
> On 2020-03-12, Stefan Metzmacher <metze@xxxxxxxxx> wrote:
> > Am 12.03.20 um 17:24 schrieb Linus Torvalds:
> > > But yes, if we have a major package like samba use it, then by all
> > > means let's add linkat2(). How many things are we talking about? We
> > > have a number of system calls that do *not* take flags, but do do
> > > pathname walking. I'm thinking things like "mkdirat()"?)
> >
> > I haven't looked them up in detail yet.
> > Jeremy can you provide a list?
> >
> > Do you think we could route some of them like mkdirat() and mknodat()
> > via openat2() instead of creating new syscalls?
>
> I have heard some folks asking for a way to create a directory and get a
> handle to it atomically -- so arguably this is something that could be
> inside openat2()'s feature set (O_MKDIR?). But I'm not sure how popular
> of an idea this is.

For fuck sake, *NO*!

We don't need any more multiplexors from hell. mkdir() and open() have
deeply different interpretation of pathnames (and anyone who asks for
e.g. traversals of dangling symlinks on mkdir() is insane). Don't try to
mix those; even O_TMPFILE had been a mistake.

Folks, we'd paid very dearly for the atomic_open() merge. We are _still_
paying for it - and keep finding bugs induced by the convoluted horrors
in that thing (see yesterday pull from vfs.git#fixes for the latest crop).
I hope to get into more or less sane shape (part - this cycle, with
followups in the next one), but the last thing we need is more complexity
in the area.

Keep the semantics simple and regular; corner cases _suck_. "Infinitely
extensible (without review)" is no virtue. And having nowhere to hide
very special flags for very special kludges is a bloody good thing.

Every fucking time we had a multiplexed syscall, it had been a massive
source of trouble. IF it has a uniform semantics - fine; we don't need
arseloads of read_this(2)/read_that(2). But when you need pages upon
pages to describe the subtle differences in the interpretation of
its arguments, you have already lost. It will be full of corner
cases, they will get zero testing and they will rot. Inevitably. All
the faster for the lack of people who would be able to keep all of that
in head.

We do have a mechanism for multiplexing; on amd64 it lives in do_syscall_64().
We really don't need openat2() turning into another one. Syscall table
slots are not in a short supply, and the level of review one gets from
"new syscall added" is higher than from "make fubar(2) recognize a new
member in options->union_full_of_crap if it has RESOLVE_TO_WANK_WITH_RIGHT_HAND
set in options->flags, affecting its behaviour in some odd ways".
Which is a good thing, damnit.