Re: [PATCH 30/32] vfs: Allow cloning of a mount tree with open(O_PATH|O_CLONE_MOUNT) [ver #8]

From: Amir Goldstein
Date: Fri Jun 01 2018 - 04:02:57 EST


[added linux-api]

On Fri, May 25, 2018 at 3:08 AM, David Howells <dhowells@xxxxxxxxxx> wrote:
> Make it possible to clone a mount tree with a new pair of open flags that
> are used in conjunction with O_PATH:
>
> (1) O_CLONE_MOUNT - Clone the mount or mount tree at the path.
>
> (2) O_NON_RECURSIVE - Don't clone recursively.
>
> Note that it's not a good idea to reuse other flags (such as O_CREAT)
> because the open routine for O_PATH does not give an error if any other
> flags are used in conjunction with O_PATH, but rather just masks off any it
> doesn't use.
>
> The resultant file struct is marked FMODE_NEED_UNMOUNT to as it pins an
> extra reference for the mount. This will be cleared by the upcoming
> move_mount() syscall when it successfully moves a cloned mount into the
> filesystem tree.
>
> Note that care needs to be taken with the error handling in do_o_path() in
> the case that vfs_open() fails as the path may or may not have been
> attached to the file struct and FMODE_NEED_UNMOUNT may or may not be set.
> Note that O_DIRECT | O_PATH could be a problem with error handling too.
>
> Signed-off-by: David Howells <dhowells@xxxxxxxxxx>
> ---
>
[...]

> @@ -977,8 +979,11 @@ static inline int build_open_flags(int flags, umode_t mode, struct open_flags *o
> * If we have O_PATH in the open flag. Then we
> * cannot have anything other than the below set of flags
> */
> - flags &= O_DIRECTORY | O_NOFOLLOW | O_PATH;
> + flags &= (O_DIRECTORY | O_NOFOLLOW | O_PATH |
> + O_CLONE_MOUNT | O_NON_RECURSIVE);
> acc_mode = 0;
> + } else if (flags & (O_CLONE_MOUNT | O_NON_RECURSIVE)) {
> + return -EINVAL;

Reject O_NON_RECURSIVE without O_CLONE_MOUNT?
That would free at least one flag combination for future use.

Doesn't it make more sense for user API to opt-into
O_RECURSIVE_CLONE, rather than opt-out of it?


> }
>
> op->open_flag = flags;
> diff --git a/include/linux/fcntl.h b/include/linux/fcntl.h
> index 27dc7a60693e..8f60e2244740 100644
> --- a/include/linux/fcntl.h
> +++ b/include/linux/fcntl.h
> @@ -9,7 +9,8 @@
> (O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC | \
> O_APPEND | O_NDELAY | O_NONBLOCK | O_NDELAY | __O_SYNC | O_DSYNC | \
> FASYNC | O_DIRECT | O_LARGEFILE | O_DIRECTORY | O_NOFOLLOW | \
> - O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE)
> + O_NOATIME | O_CLOEXEC | O_PATH | __O_TMPFILE | \
> + O_CLONE_MOUNT | O_NON_RECURSIVE)
>
> #ifndef force_o_largefile
> #define force_o_largefile() (BITS_PER_LONG != 32)
> diff --git a/include/uapi/asm-generic/fcntl.h b/include/uapi/asm-generic/fcntl.h
> index 0b1c7e35090c..f533e35ea19b 100644
> --- a/include/uapi/asm-generic/fcntl.h
> +++ b/include/uapi/asm-generic/fcntl.h
> @@ -88,6 +88,14 @@
> #define __O_TMPFILE 020000000
> #endif
>
> +#ifndef O_CLONE_MOUNT
> +#define O_CLONE_MOUNT 040000000 /* Used with O_PATH to clone the mount subtree at path */
> +#endif
> +
> +#ifndef O_NON_RECURSIVE
> +#define O_NON_RECURSIVE 0100000000 /* Used with O_CLONE_MOUNT to only clone one mount */
> +#endif
> +
> /* a horrid kludge trying to make sure that this will fail on old kernels */
> #define O_TMPFILE (__O_TMPFILE | O_DIRECTORY)
> #define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT)
>

I am not sure what are the consequences of opening O_PATH with old kernel
and getting an open file, can't think of anything bad.
Can the same be claimed for O_PATH|O_CLONE_MOUNT?

Wouldn't it be better to apply the O_TMPFILE kludge to the new
open flag, so that apps can check if O_CLONE_MOUNT feature is supported
by kernel?

Thanks,
Amir.