Re: [PATCH v18 08/15] seccomp: add system call filtering using BPF

From: Kees Cook
Date: Thu Apr 12 2012 - 18:19:26 EST


On Thu, Apr 12, 2012 at 2:47 PM, Will Drewry <wad@xxxxxxxxxxxx> wrote:
> This patch adds support for seccomp mode 2.  Mode 2 introduces the
> ability for unprivileged processes to install system call filtering
> policy expressed in terms of a Berkeley Packet Filter (BPF) program.
> This program will be evaluated in the kernel for each system call
> the task makes and computes a result based on data in the format
> of struct seccomp_data.
>
> A filter program may be installed by calling:
>  struct sock_fprog fprog = { ... };
>  ...
>  prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &fprog);
>
> The return value of the filter program determines if the system call is
> allowed to proceed or denied.  If the first filter program installed
> allows prctl(2) calls, then the above call may be made repeatedly
> by a task to further reduce its access to the kernel.  All attached
> programs must be evaluated before a system call will be allowed to
> proceed.
>
> Filter programs will be inherited across fork/clone and execve.
> However, if the task attaching the filter is unprivileged
> (!CAP_SYS_ADMIN) the no_new_privs bit will be set on the task.  This
> ensures that unprivileged tasks cannot attach filters that affect
> privileged tasks (e.g., setuid binary).
>
> There are a number of benefits to this approach. A few of which are
> as follows:
> - BPF has been exposed to userland for a long time
> - BPF optimization (and JIT'ing) are well understood
> - Userland already knows its ABI: system call numbers and desired
>  arguments
> - No time-of-check-time-of-use vulnerable data accesses are possible.
> - system call arguments are loaded on access only to minimize copying
>  required for system call policy decisions.
>
> Mode 2 support is restricted to architectures that enable
> HAVE_ARCH_SECCOMP_FILTER.  In this patch, the primary dependency is on
> syscall_get_arguments().  The full desired scope of this feature will
> add a few minor additional requirements expressed later in this series.
> Based on discussion, SECCOMP_RET_ERRNO and SECCOMP_RET_TRACE seem to be
> the desired additional functionality.
>
> No architectures are enabled in this patch.
>
> Signed-off-by: Will Drewry <wad@xxxxxxxxxxxx>
> Acked-by: Serge Hallyn <serge.hallyn@xxxxxxxxxxxxx>
> Reviewed-by: Indan Zupancic <indan@xxxxxx>
> Acked-by: Eric Paris <eparis@xxxxxxxxxx>

Acked-by: Kees Cook <keescook@xxxxxxxxxxxx>

--
Kees Cook
ChromeOS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/