Re: [PATCH 6/7] exec: Move most of setup_new_exec into flush_old_exec

From: Kees Cook
Date: Tue May 05 2020 - 17:29:26 EST


On Tue, May 05, 2020 at 02:45:33PM -0500, Eric W. Biederman wrote:
>
> The current idiom for the callers is:
>
> flush_old_exec(bprm);
> set_personality(...);
> setup_new_exec(bprm);
>
> In 2010 Linus split flush_old_exec into flush_old_exec and
> setup_new_exec. With the intention that setup_new_exec be what is
> called after the processes new personality is set.
>
> Move the code that doesn't depend upon the personality from
> setup_new_exec into flush_old_exec. This is to facilitate future
> changes by having as much code together in one function as possible.

Er, I *think* this is okay, but I have some questions below which
maybe you already investigated (and should perhaps get called out in
the changelog).

>
> Ref: 221af7f87b97 ("Split 'flush_old_exec' into two functions")
> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
> ---
> fs/exec.c | 85 ++++++++++++++++++++++++++++---------------------------
> 1 file changed, 44 insertions(+), 41 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 8c3abafb9bb1..0eff20558735 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -1359,39 +1359,7 @@ int flush_old_exec(struct linux_binprm * bprm)
> * undergoing exec(2).
> */
> do_close_on_exec(me->files);
> - return 0;
> -
> -out_unlock:
> - mutex_unlock(&me->signal->exec_update_mutex);
> -out:
> - return retval;
> -}
> -EXPORT_SYMBOL(flush_old_exec);
> -
> -void would_dump(struct linux_binprm *bprm, struct file *file)
> -{
> - struct inode *inode = file_inode(file);
> - if (inode_permission(inode, MAY_READ) < 0) {
> - struct user_namespace *old, *user_ns;
> - bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
> -
> - /* Ensure mm->user_ns contains the executable */
> - user_ns = old = bprm->mm->user_ns;
> - while ((user_ns != &init_user_ns) &&
> - !privileged_wrt_inode_uidgid(user_ns, inode))
> - user_ns = user_ns->parent;
>
> - if (old != user_ns) {
> - bprm->mm->user_ns = get_user_ns(user_ns);
> - put_user_ns(old);
> - }
> - }
> -}
> -EXPORT_SYMBOL(would_dump);
> -
> -void setup_new_exec(struct linux_binprm * bprm)
> -{
> - struct task_struct *me = current;
> /*
> * Once here, prepare_binrpm() will not be called any more, so
> * the final state of setuid/setgid/fscaps can be merged into the
> @@ -1414,8 +1382,6 @@ void setup_new_exec(struct linux_binprm * bprm)
> bprm->rlim_stack.rlim_cur = _STK_LIM;
> }
>
> - arch_pick_mmap_layout(me->mm, &bprm->rlim_stack);
> -
> me->sas_ss_sp = me->sas_ss_size = 0;
>
> /*
> @@ -1430,16 +1396,9 @@ void setup_new_exec(struct linux_binprm * bprm)
> else
> set_dumpable(current->mm, SUID_DUMP_USER);
>
> - arch_setup_new_exec();
> perf_event_exec();

What is perf expecting to be able to examine at this point? Does it want
a view of things after arch_setup_new_exec()? (i.e. "final" TIF flags,
mmap layout, etc.) From what I can, the answer is "no, it's just
resetting counters", so I think this is fine. Maybe double-check with
Steve?

> __set_task_comm(me, kbasename(bprm->filename), true);
>
> - /* Set the new mm task size. We have to do that late because it may
> - * depend on TIF_32BIT which is only updated in flush_thread() on
> - * some architectures like powerpc
> - */
> - me->mm->task_size = TASK_SIZE;
> -
> /* An exec changes our domain. We are no longer part of the thread
> group */
> WRITE_ONCE(me->self_exec_id, me->self_exec_id + 1);
> @@ -1467,6 +1426,50 @@ void setup_new_exec(struct linux_binprm * bprm)
> * credentials; any time after this it may be unlocked.
> */
> security_bprm_committed_creds(bprm);

Similarly for the LSM hook: is it expecting a post-arch-setup view? I
don't see anything looking at task_size, TIF flags, or anything else;
they seem to be just cleaning up from the old process being replaced, so
against, I think this is okay.

Not visible in this patch, the following things how happen earlier,
which I feel should maybe get called out in the changelog, with,
perhaps, better justification than what I've got here:

bprm->secureexec set/check (looks safe, since it depends on
prepare_binprm()'s security_bprm_set_creds().

rlim_stack.rlim_cur setting (safe, just needs to happen before
arch_pick_mmap_layout())

dumpable() check (looks safe, BINPRM_FLAGS_ENFORCE_NONDUMP depends on
much earlier would_dump(), and uid/gid depend on earlier calls to
prepare_binprm()'s bprm_fill_uid())

__set_task_comm (looks safe, just dealing with the task name...)

self_exec_id bump (looks safe, but I think -- it's still after uid
setting)

flush_signal_handlers() (looks safe -- nothing appears to depend on mm
nor personality)

> + return 0;
> +
> +out_unlock:
> + mutex_unlock(&me->signal->exec_update_mutex);
> +out:
> + return retval;
> +}
> +EXPORT_SYMBOL(flush_old_exec);
> +
> +void would_dump(struct linux_binprm *bprm, struct file *file)
> +{
> + struct inode *inode = file_inode(file);
> + if (inode_permission(inode, MAY_READ) < 0) {
> + struct user_namespace *old, *user_ns;
> + bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
> +
> + /* Ensure mm->user_ns contains the executable */
> + user_ns = old = bprm->mm->user_ns;
> + while ((user_ns != &init_user_ns) &&
> + !privileged_wrt_inode_uidgid(user_ns, inode))
> + user_ns = user_ns->parent;
> +
> + if (old != user_ns) {
> + bprm->mm->user_ns = get_user_ns(user_ns);
> + put_user_ns(old);
> + }
> + }
> +}
> +EXPORT_SYMBOL(would_dump);

The diff helpfully decided this moved would_dump(). ;) Is it worth
maybe just moviing it explicitly above flush_old_exec() to avoid this
churn? I dunno.

> +
> +void setup_new_exec(struct linux_binprm * bprm)
> +{
> + /* Setup things that can depend upon the personality */

Should this comment be above the function instead?

> + struct task_struct *me = current;
> +
> + arch_pick_mmap_layout(me->mm, &bprm->rlim_stack);
> +
> + arch_setup_new_exec();
> +
> + /* Set the new mm task size. We have to do that late because it may
> + * depend on TIF_32BIT which is only updated in flush_thread() on
> + * some architectures like powerpc
> + */
> + me->mm->task_size = TASK_SIZE;
> mutex_unlock(&me->signal->exec_update_mutex);
> mutex_unlock(&me->signal->cred_guard_mutex);
> }
> --
> 2.20.1
>

So, as I say, I *think* this is okay, but I always get suspicious about
reordering things in execve(). ;)

So, with a bit larger changelog discussing what's moving "earlier",
I think this looks good:

Reviewed-by: Kees Cook <keescook@xxxxxxxxxxxx>

--
Kees Cook