Re: [PATCH 5/7] seccomp_filter: Document what seccomp_filter isand how it works.

From: Frederic Weisbecker
Date: Wed May 04 2011 - 13:52:43 EST


On Wed, May 04, 2011 at 02:15:47AM -0700, Will Drewry wrote:
> Okay - so I *think* I'm following. I really like the use of SET and
> GET to allow for further constraint based on additional argument
> restrictions instead of purely reducing the filters available. The
> only part I'm stumbling on is using APPLY on a per-filter basis. In
> my current implementation, I consider APPLY to be the global enable
> bit. Whatever filters are set become set in stone and only &&s are
> handled. I'm not sure I understand why it would make sense to do a
> per-syscall-filter apply call.

Nope it's still a global apply..

> It's certainly doable, but it will
> mean that we may be logically storing something like:
>
> __NR_foo: (a == 1 || a == 2), applied
> __NR_foo: b == 2, not applied
> __NR_foo: c == 3, not applied
>
> after
>
> SECCOMP_FILTER_SET, __NR_foo, "a == 1 || a == 2"
> SECCOMP_FILTER_APPLY
> SECCOMP_FILTER_SET, __NR_foo, "b == 2"
> SECCOMP_FILTER_SET, __NR_foo, "c == 3"

No, the c == 3 would override b == 2.

> In that case, would a call to sys_foo even be tested against the
> non-applied constraints of b==2 or c==3?

No, not as long as it's not applied.

> Or would the call to set "c
> == 3" replace the "b == 2" entry. I'm not sure I see that the benefit
> exceeds the ambiguity that might introduce.

The rationale behind it is that as long as you haven't applied your filter,
you should be able to override it.

And the simplest way to override it is to do it completely. Remove what
was there before (that wasn't applied).

You could opt for a FILTER_DROP.
Let's take that example:

SECCOMP_FILTER_SET, __NR_foo, "b == 2"
SECCOMP_FILTER_SET, __NR_foo, "c == 3"

Following your logic, we should have "b == 2 && c == 3" after that.
How are you going to remove only the c == 3 sequence?

You would need SECCOMP_FILTER_DROP, __NR_foo, "c == 3"

That said, using a string with a filter expression as an id looks a
bit odd to me. That doesn't work anymore with "((c == 3))", or "c== 3",
unless you compare against the resulting tree but that's complicated.

I mean, that works, most of the time you keep your expression
and pass it as is. But the expression should be identified by its
meaning, not by some random language layout.

That also forces you to scatter your filter representation:

SECCOMP_FILTER_SET, __NR_foo, "b == 2"
SECCOMP_FILTER_SET, __NR_foo, "c == 3"

For me this shouldn't be different than

SECCOMP_FILTER_SET, __NR_foo, "b == 2 && c == 3"

Still nobody will be able to remove a part of the above expression.

So, I'm more for having something that removes everything not applied
than just a part of the non-applied filter.

Two possibilities I see:

SECCOMP_FILTER_SET, __NR_foo, "b == 2"
SECCOMP_FILTER_SET, __NR_foo, "c == 3"

// And result is "c == 3"

Or:

SECCOMP_FILTER_SET, __NR_foo, "b == 2"
SECCOMP_FILTER_SET, __NR_foo, "c == 3"

// And result is "b == 2 && c == 3"

SECCOMP_FILTER_RESET, __NR_foo //rewind to filter "0"

SECCOMP_FILTER_SET, __NR_foo, "c == 4"

// And result is "c == 4"

How does that look?



> However, if the default
> behavior it to always extend with &&, then a consumer of the interface
> could just do:
>
> SECCOMP_FILTER_SET, __NR_prctl, "option == 2"
> SECCOMP_FILTER_SET, __NR_foo, "a == 1 || a == 2"
> SECCOMP_FILTER_SET, __NR_foo, "b == 2"
> SECCOMP_FILTER_APPLY
>
> This would yield a final filter for foo of "(a == 1 || a == 2) && b ==
> 2". The call to APPLY would initiate the enforcement of the syscall
> filtering and enforce that no new filters may be added for syscalls
> that aren't already constrained. So you could still call
>
> SECCOMP_FILTER_SET, __NR_foo, "0"
>
> which is logically "((a == 1 || a == 2) && b == 2) && 0" and would be
> interpreted as just a DROP. But you could not do,
>
> SECCOMP_FILTER_SET, __NR_foo, "0"
> SECCOMP_FILTER_SET, __NR_foo, "1"
> or
> SECCOMP_FILTER_SET, __NR_read, "1"

Why?

"((a == 1 || a == 2) && b == 2) && 0 && 1"

or

"((a == 1 || a == 2) && b == 2) && 1"

does still follow our previous constraints.

>
> (since the implicit filter for all syscalls after an APPLY call should
> be "0" and additions would just be "0 && whatever").

If the default filter is 0 after apply, setting whatever afterward is
not an issue. You'll have "0 && whatever" which still means 0, that's fine.

>
> Am I missing something? If that makes sense, then we may even be able
> to reduce the extra directives by one and get a resulting interface
> that looks something like:
>
> /* Appends (&&) a new filter string for a syscall to the current
> filter value. "0" clears the filter. */
> prctl(PR_SET_SECCOMP_FILTER, syscall_nr, filter_string);
> /* Returns the current explicit filter string for a syscall */
> prctl(PR_GET_SECCOMP_FILTER, syscall_nr, buf, buflen);

So GET is eventually optional here, as a new filter automatically
append to the previous applied one.

But if a process needs some information about current filter, this
makes sense.

> /* Transition to a secure computing mode:
> * 1 - enables traditional seccomp behavior
> * 2 - enables seccomp filter enforcement and changes the implicit
> filter for syscalls from "1" to "0"
> */
> prctl(PR_SET_SECCOMP, mode);
>
> I'll set aside the v2 of the patch that uses ADD, DROP, and APPLY and
> work up a version with this interface. Do you (or anyone else :) feel
> strongly about per-syscall APPLY?

I'm still not sure what you meant by per syscall APPLY :)

> ~~
>
> As to the use of apply_on_exec, even if you whitelist: mmap, fstat64,
> brk, uname, open, read, close, set_thread_area, mprotect, munmap, and
> access _just_ to allow a process to be exec'd, it is still a
> significant reduction in kernel attack surface. Pairing that with a
> LSM, delayed chroot, etc could fill in the gaps with respect to a
> greater sandboxing solution. I'd certainly take that tradeoff for
> running binaries that I don't control the source for :) That said,
> LD_PRELOAD, ptrace injection, and other tricks could allow for the
> injection of a very targeted filterset, but I don't think that
> invalidates the on-exec case given the brittleness relying exclusively
> on such tactics. Does that seem reasonable?

Right. But then, does a default enable_on_exec behaviour really matches your
needs? Given your description, it seems that applying it on the next
fork would work too, as a random example.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/