Re: [PATCH 3/7] seccomp_filter: Enable ftrace-based system call filtering

From: Will Drewry
Date: Thu Apr 28 2011 - 12:05:20 EST


On Thu, Apr 28, 2011 at 10:57 AM, Frederic Weisbecker
<fweisbec@xxxxxxxxx> wrote:
> On Thu, Apr 28, 2011 at 10:15:04AM -0500, Will Drewry wrote:
>> On Thu, Apr 28, 2011 at 9:29 AM, Frederic Weisbecker <fweisbec@xxxxxxxxx> wrote:
>> > On Wed, Apr 27, 2011 at 10:08:47PM -0500, Will Drewry wrote:
>> >> This change adds a new seccomp mode based on the work by
>> >> agl@xxxxxxxxxxxxx This mode comes with a bitmask of NR_syscalls size and
>> >> an optional linked list of seccomp_filter objects. When in mode 2, all
>> >
>> > Since you now use the filters. Why not using them to filter syscalls
>> > entirely rather than using a bitmap of allowed syscalls?
>>
>> The current approach just uses a linked list of filters.  While a more
>> efficient data structure could be used, the bitmask provides a quick
>> binary decision, and optimizes for the relatively common case where
>> there won't be many non-binary filters to evaluate so we don't have to
>> walk the list for a larger number of yes/no decisions versus more
>> complex predicates.  Though that may be a short-sighted view! I'm
>> happy to change it up.
>
> Well, using a hlist that points to the filters may be not that slower.
> Dunno, that needs to be measured perhaps.
>
> No big deal for now.

Cool - that makes sense. I just haven't used hlist before and was
reticent to dive in given how much other new territory this was. I'll
check it out, though.

>>
>> > You have the "nr" field in syscall tracepoints.
>>
>> I'n not sure I follow.  Do you mean moving entirely to using the
>> actual tracepoint infrastructure instead of using the seccomp hooks,
>> or just looking up proper filter by syscall nr?  If there's a sane and
>> better way to do the latter, I'm all ears :)  As far as using the
>> tracepoints themselves, I looked to how the perf/ftrace interactions
>> worked and while I could've registered with the syscalls tracepoints
>> for enter and exit, it would mean later evaluation of the system call
>> interception, possibly out-of-order with respect to other registered
>> event sinks, and there is complexity in just killing current from
>> within the notifier-like list registered syscall events (as Eric Paris
>> ran into when expanding filtering into perf itself).  To get around
>> that, the tracepoint handler would have to pump the data somewhere
>> else (like it does for perf), and it just seemed messy.  I think it's
>> doable, but I don't know that the pure syscall tracepoint
>> infrastructure should be burdened with the added requirements that
>> come with seccomp-filtering.   If I didn't properly understand the
>> code, though, please set me on the right path.
>
> No, my bad I was confused. I always post questions that show my
> misunderstanding of a new (or not) patchset. It's like a tradition ;)

Awesome - I was getting worried I'd missed something terribly obvious!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/