Re: [PATCH 3/7] seccomp_filter: Enable ftrace-based system callfiltering

From: Frederic Weisbecker
Date: Thu Apr 28 2011 - 11:20:33 EST


On Thu, Apr 28, 2011 at 05:12:44PM +0200, Frederic Weisbecker wrote:
> On Wed, Apr 27, 2011 at 10:08:47PM -0500, Will Drewry wrote:
> > This change adds a new seccomp mode based on the work by
> > agl@xxxxxxxxxxxxx This mode comes with a bitmask of NR_syscalls size and
> > an optional linked list of seccomp_filter objects. When in mode 2, all
> > system calls are first checked against the bitmask to determine if they
> > are allowed or denied. If allowed, the list of filters is checked for
> > the given syscall number. If all filter predicates for the system call
> > match or the system call was allowed without restriction, the process
> > continues. Otherwise, it is killed and a KERN_INFO notification is
> > posted.
> >
> > The filter language itself is provided by the ftrace filter engine.
> > Related patches tweak to the perf filter trace and free allow the calls
> > to be shared. Filters inherit their understanding of types and arguments
> > for each system call from the CONFIG_FTRACE_SYSCALLS subsystem which
> > predefines this information in syscall_metadata associated enter_event
> > (and exit_event) structures.
> >
> > The result is that a process may reduce its available interfaces to
> > the kernel through prctl() without knowing the appropriate system call
> > number a priori and with the flexibility of filtering based on
> > register-stored arguments. (String checks suffer from TOCTOU issues and
> > should be left to LSMs to provide policy for! Don't get greedy :)
> >
> > A sample filterset for a process that only needs to interact over stdin
> > and stdout and exit cleanly is shown below:
> > sys_read: fd == 0
> > sys_write: fd == 1
> > sys_exit_group: 1
> >
> > The filters may be specified once prior to entering the reduced access
> > state:
> > prctl(PR_SET_SECCOMP, 2, filters);
>
> Instead of having such multiline filter definition with syscall
> names prepended, it would be nicer to make the parsing simplier.
>
> You could have either:
>
> prctl(PR_SET_SECCOMP, mode);
> /* Works only if we are in mode 2 */
> prctl(PR_SET_SECCOMP_FILTER, syscall_nr, filter);
>
> or:
> /*
> * If mode == 2, set the filter to syscall_nr
> * Recall this for each syscall that need a filter.
> * If a filter was previously set on the targeted syscall,
> * it will be overwritten.
> */
> prctl(PR_SET_SECCOMP, mode, syscall_nr, filter);
>
> One can erase a previous filter by setting the new filter "1".
>
> Also, instead of having a bitmap of syscall to accept. You could
> simply set "0" as a filter to those you want to deactivate:
>
> prctl(PR_SET_SECCOMP, 2, 1, 0); <- deactivate the syscall_nr 1


I meant "0" and not 0. Because a NULL filter would actually mean we
don't have a filter, which would be the same as "1".

>
> Hm?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/