Re: [PATCH v5 2/3] seccomp_filters: system call filtering using BPF

From: Indan Zupancic
Date: Wed Feb 01 2012 - 05:57:04 EST

On Wed, February 1, 2012 10:02, Will Drewry wrote:
> Hrm. It'd be the addition of a return value which is essentially an
> expansion of the possible filters. It wouldn't make the existing
> filters any less valid -- it's just that they wouldn't return errors.

True, but there will be a gap of who knows how many years where people
try to run new filters on older (stable) kernels without return support.
It's just an extra cost on top of supporting BPF at all, and creating
filters isn't easy enough that creating another version if the return
one doesn't work is exactly fun.

> However, I realize there's a desire to have all the pieces in place
> upfront. I'll see how much work this really pans out to being. On x86,
> it's easy. Some other arches, a little less so, but probably not too
> bad.

Please send the updated basic and consolidated version first, I'd like to
see how it ended up looking like.

> That's roughly how it is on x86 now except seccomp is after all the
> slow-path copy stuff. It'd be cool to bump it up in front of that
> work then pass through its return value. I'll poke around at this and
> look at the use of __secure_computing on the other arches to make sure
> I understand the impact. Maybe it is easier than I first thought it'd
> be.

Normal seccomp can be in the fast path, but I thought you need to go
through the slow path to have the registers saved at all sometimes.
I would prefer not slowing down the fast path with seccomp stuff.

> True. I'm not quite sure that it makes sense to have the BPF program
> decide what the tracer sees or doesn't see.

Only if ptrace agrees, with setting a special option. (And if multiple
filters are installed, they all have to set the ignore flag before ptrace
doesn't see it.)

> I can see it as a nice
> optimization for a sandbox implementation, but I could see something
> similar being done purely by letting a tracer catch disallowed
> syscalls (possibly via it registering an option indicating it wants
> seccomp_events).

If filters can't return errors, tracers never get to see disallowed events.
So all they can do is allow it and let the pracer block it. If you add a
ptrace option to overrule seccomp decisions, then ptrace and seccomp only
get more tangled without much gain.

> It wouldn't be quite as flexible, but it would avoid
> the simple filter becoming a more complex piece of logic and if a
> returned error were allowed (that didn't trigger ptrace), you'd not
> see a ridiculous amount of overhead from pointless syscalls. I'm not
> sure though.

It's not just that, it's also that not all filters want to kill the task
when it tries to do make a forbidden system call. BPF is more flexible
than the very limited and controlled original seccomp.

> I'll poke at the ptrace bits too and see if one approach seems to fit
> better than another.

Let's do it step by step. That the final patch may have all features
doesn't mean you have to write it all at once. So first get the next
version out, then focus on supporting more archs and see how hard it
would be to have an error return.



To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at