Re: [PATCH] clone3: validate stack arguments

From: Aleksa Sarai
Date: Thu Oct 31 2019 - 09:59:45 EST


On 2019-10-31, Christian Brauner <christian.brauner@xxxxxxxxxx> wrote:
> Validate the stack arguments and setup the stack depening on whether or not
> it is growing down or up.
>
> Legacy clone() required userspace to know in which direction the stack is
> growing and pass down the stack pointer appropriately. To make things more
> confusing microblaze uses a variant of the clone() syscall selected by
> CONFIG_CLONE_BACKWARDS3 that takes an additional stack_size argument.
> IA64 has a separate clone2() syscall which also takes an additional
> stack_size argument. Finally, parisc has a stack that is growing upwards.
> Userspace therefore has a lot nasty code like the following:
>
> #define __STACK_SIZE (8 * 1024 * 1024)
> pid_t sys_clone(int (*fn)(void *), void *arg, int flags, int *pidfd)
> {
> pid_t ret;
> void *stack;
>
> stack = malloc(__STACK_SIZE);
> if (!stack)
> return -ENOMEM;
>
> #ifdef __ia64__
> ret = __clone2(fn, stack, __STACK_SIZE, flags | SIGCHLD, arg, pidfd);
> #elif defined(__parisc__) /* stack grows up */
> ret = clone(fn, stack, flags | SIGCHLD, arg, pidfd);
> #else
> ret = clone(fn, stack + __STACK_SIZE, flags | SIGCHLD, arg, pidfd);
> #endif
> return ret;
> }
>
> or even crazier variants such as [3].
>
> With clone3() we have the ability to validate the stack. We can check that
> when stack_size is passed, the stack pointer is valid and the other way
> around. We can also check that the memory area userspace gave us is fine to
> use via access_ok(). Furthermore, we probably should not require
> userspace to know in which direction the stack is growing. It is easy
> for us to do this in the kernel and I couldn't find the original
> reasoning behind exposing this detail to userspace.
>
> /* Intentional user visible API change */
> clone3() was released with 5.3. Currently, it is not documented and very
> unclear to userspace how the stack and stack_size argument have to be
> passed. After talking to glibc folks we concluded that trying to change
> clone3() to setup the stack instead of requiring userspace to do this is
> the right course of action.
> Note, that this is an explicit change in user visible behavior we introduce
> with this patch. If it breaks someone's use-case we will revert! (And then
> e.g. place the new behavior under an appropriate flag.)
> Breaking someone's use-case is very unlikely though. First, neither glibc
> nor musl currently expose a wrapper for clone3(). Second, there is no real
> motivation for anyone to use clone3() directly since it does not provide
> features that legacy clone doesn't. New features for clone3() will first
> happen in v5.5 which is why v5.4 is still a good time to try and make that
> change now and backport it to v5.3. Searches on [4] did not reveal any
> packages calling clone3().
>
> [1]: https://lore.kernel.org/r/CAG48ez3q=BeNcuVTKBN79kJui4vC6nw0Bfq6xc-i0neheT17TA@xxxxxxxxxxxxxx
> [2]: 20191028172143.4vnnjpdljfnexaq5@wittgenstein">https://lore.kernel.org/r/20191028172143.4vnnjpdljfnexaq5@wittgenstein
> [3]: https://github.com/systemd/systemd/blob/5238e9575906297608ff802a27e2ff9effa3b338/src/basic/raw-clone.h#L31
> [4]: https://codesearch.debian.net
> Fixes: 7f192e3cd316 ("fork: add clone3")
> Cc: Arnd Bergmann <arnd@xxxxxxxx>
> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> Cc: Jann Horn <jannh@xxxxxxxxxx>
> Cc: David Howells <dhowells@xxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
> Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Cc: Florian Weimer <fweimer@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: linux-api@xxxxxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Cc: <stable@xxxxxxxxxxxxxxx> # 5.3
> Cc: GNU C Library <libc-alpha@xxxxxxxxxxxxxx>
> Signed-off-by: Christian Brauner <christian.brauner@xxxxxxxxxx>

Looks reasonable, and it'll be lovely to not have to worry about stack
ordering anymore. As for the UAPI breakage, I agree that nobody is using
clone3() yet and so we still have a bit of lee-way to fix this
behaviour.

Acked-by: Aleksa Sarai <cyphar@xxxxxxxxxx>

> ---
> include/uapi/linux/sched.h | 4 ++++
> kernel/fork.c | 33 ++++++++++++++++++++++++++++++++-
> 2 files changed, 36 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
> index 99335e1f4a27..25b4fa00bad1 100644
> --- a/include/uapi/linux/sched.h
> +++ b/include/uapi/linux/sched.h
> @@ -51,6 +51,10 @@
> * sent when the child exits.
> * @stack: Specify the location of the stack for the
> * child process.
> + * Note, @stack is expected to point to the
> + * lowest address. The stack direction will be
> + * determined by the kernel and set up
> + * appropriately based on @stack_size.
> * @stack_size: The size of the stack for the child process.
> * @tls: If CLONE_SETTLS is set, the tls descriptor
> * is set to tls.
> diff --git a/kernel/fork.c b/kernel/fork.c
> index bcdf53125210..55af6931c6ec 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2561,7 +2561,35 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
> return 0;
> }
>
> -static bool clone3_args_valid(const struct kernel_clone_args *kargs)
> +/**
> + * clone3_stack_valid - check and prepare stack
> + * @kargs: kernel clone args
> + *
> + * Verify that the stack arguments userspace gave us are sane.
> + * In addition, set the stack direction for userspace since it's easy for us to
> + * determine.
> + */
> +static inline bool clone3_stack_valid(struct kernel_clone_args *kargs)
> +{
> + if (kargs->stack == 0) {
> + if (kargs->stack_size > 0)
> + return false;
> + } else {
> + if (kargs->stack_size == 0)
> + return false;
> +
> + if (!access_ok((void __user *)kargs->stack, kargs->stack_size))
> + return false;
> +
> +#if !defined(CONFIG_STACK_GROWSUP) && !defined(CONFIG_IA64)
> + kargs->stack += kargs->stack_size;
> +#endif
> + }
> +
> + return true;
> +}
> +
> +static bool clone3_args_valid(struct kernel_clone_args *kargs)
> {
> /*
> * All lower bits of the flag word are taken.
> @@ -2581,6 +2609,9 @@ static bool clone3_args_valid(const struct kernel_clone_args *kargs)
> kargs->exit_signal)
> return false;
>
> + if (!clone3_stack_valid(kargs))
> + return false;
> +
> return true;
> }

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

Attachment: signature.asc
Description: PGP signature