Re: [PATCH 3/7] seccomp_filter: Enable ftrace-based system callfiltering

From: Steven Rostedt
Date: Thu Apr 28 2011 - 14:21:23 EST


On Thu, 2011-04-28 at 13:01 -0500, Will Drewry wrote:

> Good to know! My question (below) is if I should even be using an RCU
> guard at all. I may have been a bit too overzealous.
>
> >> >
> >> > I actually thought you were going to be more extreme about the seccomp
> >> > state than you are: I thought you were going to tie a filter list to
> >> > seccomp state. So adding or removing a filter would have required
> >> > duping the seccomp state, duping all the filters, making the change in
> >> > the copy, and then swapping the new state into place. Slow in the
> >> > hopefully rare update case, but safe.
>
> Hrm, I think I'm confused now! This is exactly what I *thought* the
> code was doing.

I guess my thought by looking at the code is the call_rcu() to free the
filters in drop_matching_filters().

Also, you have seccomp_drop_all_filters() as a standalone function that
is even exported to modules.

This to me, seems that the filters can disappear at any time. Because
the freeing is done with a rcu_call() the access to the filters needs a
ref count.

>
> At present, seccomp_state can be shared across predecessor/ancestor
> relationships using refcounting in fork.c (get/put). However, the
> only way to change a given seccomp_state or its filters is either
> through the one-bit on_next_syscall change or through
> prctl_set_seccomp. In prctl_set_seccomp, it does:
> state = (orig_state ? seccomp_state_dup(orig_state) :
> seccomp_state_new());
> operates on the new state and then rcu_assign_pointer()s it to the
> task. I didn't intentionally provide any way to drop filters from an
> existing state object nor change the filtered syscalls on an in-use
> object. That _dup call should hit the impromperly rcu_locked
> copy_all_filters returning duplicates of the original filters by
> reparsing the filter_string.
>
> Did I accidentally provide a means to mutate a state object or filter
> list without dup()ing? :/

That seccomp_drop_all_filters() looks like you can.

>
> >> > You don't have to do that, but then I'm pretty sure you'll need to add
> >> > reference counts to each filter and use rcu cycles to a reader from
> >> > having the filter disappear mid-read.
>
> Right now, I don't think it is possible for seccomp_copy_all_filters()
> to be called with a src list that changes since every change is
> guarded by a seccomp_state_dup(). If that's not true, then I violated
> my own invariant :/ If that is the case, should I not treat the list
> as an RCU list? There should never be any simultaneous
> reader/writers, just a single reader/writer or multiple readers.

Again, that seccomp_copy_all_filters() is called free standing (exported
to modules). Which to me means that it can be called by anyone at
anytime. There is no protection of this src list.

>
> >>
> >> Or you can preallocate the new filters, call rcu_read_lock(), check if
> >> the number of old filters is the same or less, if more, call
> >> rcu_read_unlock, and try allocating more, and then call rcu_read_lock()
> >> again and repeat. Then just copy the filters to the preallocate ones.
> >> rcu_read_unlock() and then free any unused allocated filters.
> >>
> >> Maybe a bit messy, but not that bad.
> >
> > Sounds good.
>
> I'd prefer a heavy-weight copy ;)
>
> I think I'm a bit lost -- am I missing something obvious here? I was
> hoping by using a swapped-in-seccomp_state-pointer, locking and
> consistency internal to the state objects would be a tad easier -
> though expensive.

Perhaps, but those free standing functions (the ones that are exported
to modules) seem like they can destroy state.

Your code would have been correct if you could call kzalloc under
rcu_read_lock() (which you can on some kernel configurations but not
all). The issue is that you need to pull out that allocation from the
rcu_read_lock() because rcu_read_lock assumes you can't preempt, and
that allocation can schedule out. The access to the filters must be done
under rcu_read_lock(), other than that, you're fine.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/