Re: [RFC,PATCH 1/2] seccomp_filters: system call filtering using BPF

From: Will Drewry
Date: Tue Jan 17 2012 - 16:09:19 EST


On Tue, Jan 17, 2012 at 2:42 PM, Will Drewry <wad@xxxxxxxxxxxx> wrote:
> On Tue, Jan 17, 2012 at 2:34 PM, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>> On Mon, Jan 16, 2012 at 10:46 PM, Indan Zupancic <indan@xxxxxx> wrote:
>>> So call it once and store the value in a long. Then copy the low half
>>> to the right place and then the upper half when on 64 bits. It may not
>>> look too pretty, but the compiler should be able to optimise almost all
>>> overhead away and end up with 6 (or 12) int copies. Something like this:
>>>
>>> struct bpf_data {
>>>        uint32 syscall_nr;
>>>        uint32 arg_low[MAX_SC_ARGS];
>>>        uint32 arg_high[MAX_SC_ARGS];
>>> };
>>>
>>> void fill_bpf_data(struct task_struct *t, struct pt_regs *r, struct bpf_data *d)
>>> {
>>>        int i;
>>>        unsigned long arg;
>>>
>>>        d->syscall_nr = syscall_get_nr(t, r);
>>>        for (i = 0; i < MAX_SC_ARGS; ++i){
>>>                syscall_get_arguments(t, r, i, 1, &arg);
>>>                d->arg_low[i] = arg;
>>>                d->arg_high[i] = arg >> 32;
>>>        }
>>> }
>>
>> If this turns out to be expensive, it might be possible to break it up
>> and load the arguments on demand (and cache them); i.e. have
>> load_pointer() or similar notice when it is about to access something
>> other than bpf_data.syscall_nr.
>
> Makes perfect sense!  In theory (as a few other people pointed this
> out off list), it is entirely possible to never populate any data for
> load_pointer except an optional cache.  Just provide a custom
> load_pointer that knows to take the offset return the syscall nr or
> the args or some slice of the returned data.
>
> This is even easier if the struct looks like:
> struct {
>  int nr;
>  union {
>    uint32_t args32[6];
>    uint64_t args64[6];
>  }
> };
>
> since you can just use the offset without doing any endian-based
> splitting.  Another suggestion (thanks roland!) was to add
>  int syscall_arch;
> to the struct populated with the AUDIT_ARCH_* defines.  This would
> help the case Indan was worried about -- portable filter programs.
>
> It looks like there'd be some cross-arch plumbing to make the
> AUDIT_ARCH_ data available, but not too bad.
>
> Seem sane? I'm headed down this path now and I think it'll work out
> assuming there aren't major objections to the syscall_arch piece.

Hrm. I'm still not so sure about the arch bit. Without it, BPF
programs aren't directly share-able, but they could be as long as the
values for k and syscall numbers are being adapted. By putting arch
in the program, it makes it more likely that every system call will
have a bpf preamble that has to check the syscall_arch. It could
easily add 100s of nanoseconds to every call (on slower arches).

I'll probably do the next patch series without arch-checking support
then I can add if it is seems needed. Nothing forces a filter program
to check it, so it could be that we let the author make the decision.

cheers!
will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/