Re: [RFC v2 09/10] landlock: Handle cgroups

From: Andy Lutomirski
Date: Sun Aug 28 2016 - 04:15:15 EST


On Aug 27, 2016 8:12 PM, "Alexei Starovoitov"
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Sat, Aug 27, 2016 at 12:30:36AM -0700, Andy Lutomirski wrote:
> > > cgroup is the common way to group multiple tasks.
> > > Without cgroup only parent<->child relationship will be possible,
> > > which will limit usability of such lsm to a master task that controls
> > > its children. Such api restriction would have been ok, if we could
> > > extend it in the future, but unfortunately task-centric won't allow it
> > > without creating a parallel lsm that is cgroup based.
> > > Therefore I think we have to go with cgroup-centric api and your
> > > application has to use cgroups from the start though only parent-child
> > > would have been enough.
> > > Also I don't think the kernel can afford two bpf based lsm. One task
> > > based and another cgroup based, so we have to find common ground
> > > that suits both use cases.
> > > Having unprivliged access is a subset. There is no strong reason why
> > > cgroup+lsm+bpf should be limited to root only always.
> > > When we can guarantee no pointer leaks, we can allow unpriv.
> >
> > I don't really understand what you mean. In the context of landlock,
> > which is a *sandbox*, can one of you explain a use case that
> > materially benefits from this type of cgroup usage? I haven't thought
> > of one.
>
> In case of seccomp-like sandbox where parent controls child processes
> cgroup is not needed. It's needed when container management software
> needs to control a set of applications. If we can have one bpf-based lsm
> that works via cgroup and without, I'd be fine with it. Right now
> I haven't seen a plausible proposal to do that. Therefore cgroup based
> api is a common api that works for sandbox as well, though requiring
> parent to create a cgroup just to control a single child is cumbersome.
>

I don't believe that a common API can work to accomplish your goal.
For privileged container management, the manager is trusted. For
unprivileged sandboxing, the manager is emphatically not trusted,
which means you need special rules like NO_NEW_PRIVS, and, unless you
want to start restricting setuid and such in some cgroups, you really
do need a different interface for joining the sandbox than whatever
the container manager is using.

What could make sense is to have one BPF-based LSM that supports both
a seccomp-like unprivileged interface and a cgroup-based privileged
interface. Most of the code for it is the BPF part anyway -- all that
the cgroup or seccomp part needs to do is to figure out which BPF
program(s) to call.

Also, for container management software, you don't really need
everything tied to cgroup -- you just need a way to cleanly add new
processes to the same security context.