Re: Potential issues (security and otherwise) with the current cgroup-bpf API

From: Andy Lutomirski
Date: Tue Jan 17 2017 - 15:34:37 EST


On Tue, Jan 17, 2017 at 5:58 AM, Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> On Tue 17-01-17 14:32:04, Peter Zijlstra wrote:
>> On Tue, Jan 17, 2017 at 02:03:03PM +0100, Michal Hocko wrote:
>> > On Sun 15-01-17 20:19:01, Tejun Heo wrote:
>> > [...]
>> > > So, what's proposed is a proper part of bpf. In terms of
>> > > implementation, cgroup helps by hosting the pointers but that doesn't
>> > > necessarily affect the conceptual structure of it. Given that, I
>> > > don't think it'd be a good idea to add anything to cgroup interface
>> > > for this feature. Introspection is great to have but this should be
>> > > introspectable together with other bpf programs using the same
>> > > mechanism. That's where it belongs.
>> >
>> > If BPF only piggy backs on top of cgroup to iterate tasks shouldn't we
>> > at least enforce that the cgroup has to be a leaf one and no further
>> > children groups can be created once there is BPF program attached?
>>
>> Why (again) this stupid constraint?
>>
>> If you want to use cgroups for tagging (like perf does), _any_ parent
>> cgroup will also tag you.
>>
>> So creating child cgroups, and placing tasks in it, should not be a
>> problem, the BPF thing should apply to all of them.
>
> This would require using hierarchical cgroup iterators to iterate over
> tasks. As per Andy's testing this doesn't seem to be the case. I haven't
> checked the implementation closely but my understanding was that using
> only cgroup specific tasks was intentional.

The current semantics are AFAIK that only the innermost cgroup that
has a hook installed is in effect. I think this is the wrong design.

I think that the right semantics are probably to support both
innermost-to-outermost and outermost-to-innermost and to select which
is appropriate for each hook. Suppose we have a cgroup /a/b where a
and b both have hooks installed. If the hook is a socket creation or
egress hook, I think that b's hook should run first. If b's hook
rejects, then a's hook is not run. If b's hook accepts, then a's hook
is run. This way a gets the last word on any changes to the socket
settings and a sees exactly what would happen if it were to accept.

Conversely, for ingress hooks, I think that a's hook should run first.
This way a sees the packet as it originally came in and can modify or
reject it, and then b only sees whatever a chooses to let through.

The guiding principle here is that, for actions that originate outside
the machine, the outer hooks should IMO run first and, for actions
that originate from a task in a cgroup, the innermost hooks should run
first.

--Andy