Re: [PATCH] do_open(): Fix O_DIRECTORY | O_CREAT behavior

From: Christian Brauner
Date: Tue Mar 21 2023 - 12:18:01 EST


On Tue, Mar 21, 2023 at 03:24:19PM +0100, Christian Brauner wrote:
> On Mon, Mar 20, 2023 at 01:24:52PM -0700, Linus Torvalds wrote:
> > On Mon, Mar 20, 2023 at 12:27 PM Pedro Falcato <pedro.falcato@xxxxxxxxx> wrote:
> > >
> > > 1) Pre v5.7 Linux did the open-dir-if-exists-else-create-regular-file
> > > we all know and """love""".
> >
> > So I think we should fall back to this as a last resort, as a "well,
> > it's our historical behavior".
> >
> > > 2) Post 5.7, we started returning this buggy -ENOTDIR error, even when
> > > successfully creating a file.
> >
> > Yeah, I think this is the worst of the bunch and has no excuse (unless
> > some crazy program has started depending on it, which sounds really
> > *really* unlikely).
> >
> > > 3) NetBSD just straight up returns EINVAL on open(O_DIRECTORY | O_CREAT)
> > > 4) FreeBSD's open(O_CREAT | O_DIRECTORY) succeeds if the file exists
> > > and is a directory. Fails with -ENOENT if it falls onto the "O_CREAT"
> > > path (i.e it doesn't try to create the file at all, just ENOENT's;
> > > this changed relatively recently, in 2015)
> >
> > Either of these sound sensible to me.
> >
> > I suspect (3) is the clearest case.
>
> Yeah, we should try that.
>
> >
> > And (4) might be warranted just because it's closer to what we used to
> > do, and it's *possible* that somebody happens to use O_DIRECTORY |
> > O_CREAT on directories that exist, and never noticed how broken that
> > was.
>
> I really really hope that isn't the case because (4) is pretty nasty.
> Having to put this on a manpage seems nightmarish.
>
> >
> > And (4) has another special case: O_EXCL. Because I'm really hoping
> > that O_DIRECTORY | O_EXCL will always fail.
>
> I've detailed the semantics for that in the commit message below...
>
> >
> > Is the proper patch something along the lines of this?
>
> Yeah, I think that would do it and I think that's what we should try to
> get away with. I just spent time and took a look who passes O_DIRECTORY
> and O_CREAT and interestingly there are a number of comments roughly
> along the lines of the following example:
>
> /* Ideally we could use openat(O_DIRECTORY | O_CREAT | O_EXCL) here
> * to create and open the directory atomically
>
> suggests that people who specify O_DIRECTORY | O_CREAT are interested in
> creating a directory. But since this never did work they don't tend to
> use that flag combination (I've collected a few samples in [1] to [4].).
>
> (As a sidenote, posix made an interpretation change a long time ago to
> at least allow for O_DIRECTORY | O_CREAT to create a directory (see [3]).
>
> But that's a whole different can of worms and I haven't spent any
> thoughts even on feasibility. And even if we should probably get through
> a couple of kernels with O_DIRECTORY | O_CREAT failing with EINVAL first.)
>
> >
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -1186,6 +1186,8 @@ inline int build_open_flags(const struct
> > open_how *how, struct open_flags *op)
> >
> > /* Deal with the mode. */
> > if (WILL_CREATE(flags)) {
> > + if (flags & O_DIRECTORY)
> > + return -EINVAL;
>
> This will be problematic because for weird historical reasons O_TMPFILE
> includes O_DIRECTORY so this would unfortunately break O_TMPFILE. :/
> I'll try to have a patch ready in a bit.

So, had to do some testing first:

This survives xfstests:
sudo ./check -g unlink,idmapped
which pounds on the creation and O_TMPFILE paths.

It also survives LTP:
# The following includes openat2, openat, open, fsopen, open_tree, etc.
sudo ./runtltp -f syscalls open

I've also written a (very nasty) test script:

#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
int fd;

fd = open("/tmp/TEST", O_DIRECTORY | O_CREAT);
if (fd >= 0)
printf("%d\n", fd);
else
printf("%m: fd(%d)\n", fd);

fd = open("/tmp/TEST", O_DIRECTORY | O_CREAT | O_EXCL);
if (fd >= 0)
printf("%d\n", fd);
else
printf("%m: fd(%d)\n", fd);

fd = open("/tmp/TEST", O_DIRECTORY | O_EXCL);
if (fd >= 0)
printf("%d\n", fd);
else
printf("%m: fd(%d)\n", fd);

exit(EXIT_SUCCESS);
}
ubuntu@vm1:~/ssd$ sudo ./create_test
Invalid argument: fd(-1)
Invalid argument: fd(-1)
No such file or directory: fd(-1)

I think this should work. From the commit message it's hopefully clear
that this is proper semantic change. But I think we might be able to
pull this off given how rare this combination should be and how
unnoticed the current regression has gone and for how long...

So I came up with the following description trying to detail the
semantics prior to v5.7, post v5.7 up until this patch, and then after
the patch. Linus, I added your SOB to this but tell me if you'd rather
not be associated with this potential mess...

It would be very nice if we had tests for the new behavior. So if @Pedro
would be up for it that would be highly appreciated. If not I'll put it
on my ToDo...

So I can carry this and sent a pr or it can be picked up directly. I'm
not fuzzed. Hopefully there are no surprises or objections...