[PATCH] open: return EINVAL for O_DIRECTORY | O_CREAT

From: Christian Brauner
Date: Tue Mar 21 2023 - 04:18:07 EST


After a couple of years and multiple LTS releases we received a report
that the behavior of O_DIRECTORY | O_CREAT changed starting with v5.7.

On kernels prior to v5.7 combinations of O_DIRECTORY, O_CREAT, O_EXCL
had the following semantics:

(1) open("/tmp/d", O_DIRECTORY | O_CREAT)
* d doesn't exist: create regular file
* d exists and is a regular file: ENOTDIR
* d exists and is a directory: EISDIR

(2) open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL)
* d doesn't exist: create regular file
* d exists and is a regular file: EEXIST
* d exists and is a directory: EEXIST

(3) open("/tmp/d", O_DIRECTORY | O_EXCL)
* d doesn't exist: ENOENT
* d exists and is a regular file: ENOTDIR
* d exists and is a directory: open directory

On kernels since to v5.7 combinations of O_DIRECTORY, O_CREAT, O_EXCL
have the following semantics:

(1) open("/tmp/d", O_DIRECTORY | O_CREAT)
* d doesn't exist: ENOTDIR (create regular file)
* d exists and is a regular file: ENOTDIR
* d exists and is a directory: EISDIR

(2) open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL)
* d doesn't exist: ENOTDIR (create regular file)
* d exists and is a regular file: EEXIST
* d exists and is a directory: EEXIST

(3) open("/tmp/d", O_DIRECTORY | O_EXCL)
* d doesn't exist: ENOENT
* d exists and is a regular file: ENOTDIR
* d exists and is a directory: open directory

This is a fairly substantial semantic change that userspace didn't
notice until Pedro took the time to deliberately figure out corner
cases. Since no one noticed this breakage we can somewhat safely assume
that O_DIRECTORY | O_CREAT combinations are likely unused.

The v5.7 breakage is especially weird because while ENOTDIR is returned
indicating failure a regular file is actually created. This doesn't make
a lot of sense.

Time was spent finding potential users of this combination. Searching on
codesearch.debian.net showed that codebases often express semantical
expectations about O_DIRECTORY | O_CREAT which are completely contrary
to what our code has done and currently does.

The expectation often is that this particular combination would create
and open a directory. This suggests users who tried to use that
combination would stumble upon the counterintuitive behavior no matter
if pre-v5.7 or post v5.7 and quickly realize neither semantics give them
what they want. For some examples see the code examples in [1] to [3]
and the discussion in [4].

There are various ways to address this issue. The lazy/simple option
would be to restore the pre-v5.7 behavior and to just live with that bug
forever. But since there's a real chance that the O_DIRECTORY | O_CREAT
quirk isn't relied upon we should try to get away with murder(ing bad
semantics) first. If we need to Frankenstein pre-v5.7 behavior later so
be it.

So let's simply return EINVAL categorically for O_DIRECTORY | O_CREAT
combinations. In addition to cleaning up the old bug this also opens up
the possiblity to make that flag combination do something more intuitive
in the future.

Starting with this commit the following semantics apply:

(1) open("/tmp/d", O_DIRECTORY | O_CREAT)
* d doesn't exist: EINVAL
* d exists and is a regular file: EINVAL
* d exists and is a directory: EINVAL

(2) open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL)
* d doesn't exist: EINVAL
* d exists and is a regular file: EINVAL
* d exists and is a directory: EINVAL

(3) open("/tmp/d", O_DIRECTORY | O_EXCL)
* d doesn't exist: ENOENT
* d exists and is a regular file: ENOTDIR
* d exists and is a directory: open directory

One additional note, O_TMPFILE is implemented as:

#define __O_TMPFILE 020000000
#define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
#define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT)

For older kernels it was important to return an explicit error when
O_TMPFILE wasn't supported. So it is implemented to look like
O_DIRECTORY | O_RDWR but without O_CREAT. The reason was that
O_DIRECTORY | O_CREAT used to work and created a regular file. Allowing
it to be specified would've potentially caused a regular file to be
created on older kernels while the caller would believe they created an
actual O_TMPFILE. So instead O_TMPFILE has included O_DIRECTORY | O_RDWR
and the code uses O_TMPFILE_MASK to check that O_CREAT isn't specified
returning EINVAL if it is.

With this patch, we categorically reject O_DIRECTORY | O_CREAT and
return EINVAL. That means, we could write this in a way that makes the
if ((flags & O_TMPFILE_MASK) != O_TMPFILE) check superfluous but let's
not do that. The check documents the peculiar aspects of O_TMPFILE quite
nicely and can serve as a reminder how easy it is to break.

As Aleksa pointed out O_PATH is unaffected by this change since it
always returned EINVAL if O_CREAT was specified - with or without
O_DIRECTORY.

Link: https://lore.kernel.org/lkml/20230320071442.172228-1-pedro.falcato@xxxxxxxxx
Link: https://sources.debian.org/src/flatpak/1.14.4-1/subprojects/libglnx/glnx-dirfd.c/?hl=324#L324 [1]
Link: https://sources.debian.org/src/flatpak-builder/1.2.3-1/subprojects/libglnx/glnx-shutil.c/?hl=251#L251 [2]
Link: https://sources.debian.org/src/ostree/2022.7-2/libglnx/glnx-dirfd.c/?hl=324#L324 [3]
Link: https://www.openwall.com/lists/oss-security/2014/11/26/14 [4]
Reported-by: Pedro Falcato <pedro.falcato@xxxxxxxxx>
Cc: Aleksa Sarai <cyphar@xxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Christian Brauner (Microsoft) <brauner@xxxxxxxxxx>
---
This survives xfstests:

sudo ./check -g unlink,idmapped

which pounds on the creation and O_TMPFILE paths.

It also survives LTP:

sudo ./runtltp -f syscalls openat2
sudo ./runtltp -f syscalls openat
# The following includes openat2, openat, open, fsopen, open_tree, etc.
sudo ./runtltp -f syscalls open
sudo ./runtltp -f syscalls create_file
sudo ./runtltp -f syscalls create_files

I've also written a (very nasty) test script:

#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
int fd;

fd = open("/tmp/TEST", O_DIRECTORY | O_CREAT);
if (fd >= 0)
printf("%d\n", fd);
else
printf("%m: fd(%d)\n", fd);

fd = open("/tmp/TEST", O_DIRECTORY | O_CREAT | O_EXCL);
if (fd >= 0)
printf("%d\n", fd);
else
printf("%m: fd(%d)\n", fd);

fd = open("/tmp/TEST", O_DIRECTORY | O_EXCL);
if (fd >= 0)
printf("%d\n", fd);
else
printf("%m: fd(%d)\n", fd);

exit(EXIT_SUCCESS);
}
ubuntu@vm1:~/ssd$ sudo ./create_test
Invalid argument: fd(-1)
Invalid argument: fd(-1)
No such file or directory: fd(-1)
---
fs/open.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/fs/open.c b/fs/open.c
index 4401a73d4032..39e360f9490d 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -1135,6 +1135,8 @@ struct file *open_with_fake_path(const struct path *path, int flags,
EXPORT_SYMBOL(open_with_fake_path);

#define WILL_CREATE(flags) (flags & (O_CREAT | __O_TMPFILE))
+#define INVALID_CREATE(flags) \
+ ((flags & (O_DIRECTORY | O_CREAT)) == (O_DIRECTORY | O_CREAT))
#define O_PATH_FLAGS (O_DIRECTORY | O_NOFOLLOW | O_PATH | O_CLOEXEC)

inline struct open_how build_open_how(int flags, umode_t mode)
@@ -1207,6 +1209,10 @@ inline int build_open_flags(const struct open_how *how, struct open_flags *op)
if (!(acc_mode & MAY_WRITE))
return -EINVAL;
}
+
+ if (INVALID_CREATE(flags))
+ return -EINVAL;
+
if (flags & O_PATH) {
/* O_PATH only permits certain other flags to be set. */
if (flags & ~O_PATH_FLAGS)
--
2.34.1