Re: [RFC 2/2] prctl: PR_SET_MM -- Introduce PR_SET_MM_MAP operation

From: Andrew Vagin
Date: Fri Jul 04 2014 - 03:53:33 EST


On Thu, Jul 03, 2014 at 06:33:20PM +0400, Cyrill Gorcunov wrote:
> During development of c/r we've noticed that in case if we need to
> support user namespaces we face a problem with capabilities in
> prctl(PR_SET_MM, ...) call.
>
> Current PR_SET_MM code forbids to modify fields if no CAP_SYS_RESOURCE
> granted, but rather relies on one who use this interface is passing
> more-less sane values (though the values must pass the basic validation
> procedure).
>
> It seems a better approach is to eliminate CAP_SYS_RESOURCE check but
> provide all new values in one bundle, which would allow the kernel to make
> more intensive test for sanity of values and same time allow us to
> support checkpoint/restore of user namespaces.
>
> Thus a new command (PR_SET_MM_MAP) introduced. It takes a pointer of
> prctl_mm_map structure which carries all members to be updated.
>
> Most intensive work is done in validate_prctl_map_locked helper,
> because we need to make sure the values are valid. Thus we do
>
> - check the values are laying inside available user address space
> - stack, brk, command line arguments and evnironment variables
> must point to already existing VMA
> - values must be ordered (start < end)
> - if RLIMITs are defined don't allow to exceed it with new values
>
> Since it uses prctl_set_mm_exe_file_locked helper, updating the
> exe-file link is one-shot action for security reason.
>
> I believe the old interface should be deprecated and ripped off
> in a couple of kernel releases if noone against.
>
> To test if new interface is implemented in the kernel one
> can pass PR_SET_MM_MAP_SIZE opcode and the kernel returns
> the size of currently supported struct prctl_mm_map.

I like the idea of this patch. See a few comments inline

>
> Signed-off-by: Cyrill Gorcunov <gorcunov@xxxxxxxxxx>
> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> Cc: Tejun Heo <tj@xxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Cc: Andrew Vagin <avagin@xxxxxxxxxx>
> Cc: Eric W. Biederman <ebiederm@xxxxxxxxxxxx>
> Cc: Serge Hallyn <serge.hallyn@xxxxxxxxxxxxx>
> Cc: Pavel Emelyanov <xemul@xxxxxxxxxxxxx>
> Cc: Vasiliy Kulikov <segoon@xxxxxxxxxxxx>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx>
> Cc: Michael Kerrisk <mtk.manpages@xxxxxxxxx>
> ---
> include/uapi/linux/prctl.h | 19 ++++
> kernel/sys.c | 188 ++++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 206 insertions(+), 1 deletion(-)
>
> Index: linux-2.6.git/include/uapi/linux/prctl.h
> ===================================================================
> --- linux-2.6.git.orig/include/uapi/linux/prctl.h
> +++ linux-2.6.git/include/uapi/linux/prctl.h
> @@ -119,6 +119,25 @@
> # define PR_SET_MM_ENV_END 11
> # define PR_SET_MM_AUXV 12
> # define PR_SET_MM_EXE_FILE 13
> +# define PR_SET_MM_MAP 14
> +# define PR_SET_MM_MAP_SIZE 15
> +
> +struct prctl_mm_map {
> + unsigned long start_code;

"unsigned long" has different sizes on x86_64 and x86, so a compat
is required for x32 processes on x64 kernel.

> + unsigned long end_code;
> + unsigned long start_data;
> + unsigned long end_data;
> + unsigned long start_brk;
> + unsigned long brk;

...

> +
> + error |= __prctl_check_addr_space(prctl_map, start_code);
> + error |= __prctl_check_addr_space(prctl_map, end_code);
> + error |= __prctl_check_addr_space(prctl_map, start_data);
> + error |= __prctl_check_addr_space(prctl_map, end_data);
> + error |= __prctl_check_addr_space(prctl_map, start_stack);
> + error |= __prctl_check_addr_space(prctl_map, start_brk);
> + error |= __prctl_check_addr_space(prctl_map, brk);
> + error |= __prctl_check_addr_space(prctl_map, arg_start);
> + error |= __prctl_check_addr_space(prctl_map, arg_end);
> + error |= __prctl_check_addr_space(prctl_map, env_start);
> + error |= __prctl_check_addr_space(prctl_map, env_end);
> + if (error)
> + goto out;
> +#undef __prctl_check_addr_space
> +
> + /*
> + * Stack, brk, command line arguments and environment must exist.
> + */
> + stack_vma = find_vma(mm, prctl_map->start_stack);

Why do we not use __prctl_check_vma here?

> + if (!stack_vma) {
> + error = -EINVAL;
> + goto out;
> + }
> +#define __prctl_check_vma(mm, addr) find_vma(mm, addr) ? 0 : -EINVAL
> + error |= __prctl_check_vma(mm, prctl_map->start_brk);
> + error |= __prctl_check_vma(mm, prctl_map->brk);
> + error |= __prctl_check_vma(mm, prctl_map->arg_start);
> + error |= __prctl_check_vma(mm, prctl_map->arg_end);
> + error |= __prctl_check_vma(mm, prctl_map->env_start);
> + error |= __prctl_check_vma(mm, prctl_map->env_end);
> + if (error)
> + goto out;
> +#undef __prctl_check_vma
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/