Magic Security Dust: Appropriating SECCOMP

From: David Thomas
Date: Wed Jun 24 2009 - 01:05:39 EST


"Code I trust processing data I don't" is a common situation.
Web browsers, movie players, image viewers, document readers,
video games, and many other applications deal with data
that could be malicious. I was looking for an easy way to
restrict the damage my software might do should my handling
of malicious data be less than perfect. The options I found,
under Linux, were SECCOMP and ptrace. SECCOMP seemed much too
narrow, while ptrace seemed more complicated than I needed and
also incurs a performance penalty. So I modified SECCOMP as
described below, and have been running on this kernel since
November with no apparent issues or change in performance.
I was wondering what thoughts people had, before I updated
the patch to the latest kernel.

For those unfamiliar, SECCOMP allows a process to restrict
the set of syscalls it can later access through a flag set
with prctl.

While SECCOMP originally worked from "modes" with lists of
allowable syscalls, I thought it better to have a set of
flags. There were two reasons for this. First, it allows
easier checking of whether a syscall should be permitted
(a simple bitwise-and). Second, I find it easier to reason
about composing groups of syscalls than remembering precisely
what is permitted/denied in a list of modes. As there was
previously only a mode 1, having flag zero provide the same
syscalls maintains backwards-compatibility.

Moving the checks from the audit/trace code out to the
individual syscalls means that each syscall we're doing one
check and a correctly predicted branch, instead of n checks
with (usually) a mis-predicted branch. This also means greater
granularity at build time - rather than merely "SECCOMP or no
SECCOMP", an individual build can check the flags for open and
fork but not for read and write, or any other combination.
Because no process will be relying on these checks to *add*
functionality, this will be completely transparent to user-space
and can be configured as best suits the individual deployment,
balancing the users' paranoia against their need for speed.

This is not meant to replace SELINUX, jails, or other
security mechanisms, but to supplement them. This makes it
easier for a developer to limit the damage a process might
unintentionally do, regardless of the setup of the end user.
Lest anyone get the wrong idea, "magic security dust" is of
course tongue-in-cheek - security issues require thought and
care with or without this patch. This just seems another tool
which attacks these issues from a slightly different direction.

Specific points for further discussion, if this is something
people are interested in:

* Which syscalls are grouped under what flags, and what
appropriate names for those flags might be. Bear in mind that
starting from an over-constrained set of flags provides a better
path forward, as we won't be breaking anything that worked
before if we add new flags or new syscalls to existing flags.

* What to do during fork or exec when permitted - note that no
decision is reflected in the code I already have, as there's
presently no flag that permits either of these.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/