Re: cgroup access daemon

From: Serge Hallyn
Date: Fri Jun 28 2013 - 15:21:37 EST


Quoting Tim Hockin (thockin@xxxxxxxxxx):
> On Fri, Jun 28, 2013 at 9:31 AM, Serge Hallyn <serge.hallyn@xxxxxxxxxx> wrote:
> > Quoting Tim Hockin (thockin@xxxxxxxxxx):
> >> On Thu, Jun 27, 2013 at 11:11 AM, Serge Hallyn <serge.hallyn@xxxxxxxxxx> wrote:
> >> > Quoting Tim Hockin (thockin@xxxxxxxxxx):
> > Could you give examples?
> >
> > If you have a white/academic paper I should go read, that'd be great.
>
> We don't have anything on this, but examples may help.
>
> Someone running as root should be able to connect to the "native"
> daemon and read or write any cgroup file they want, right? You could
> argue that root should be able to do this to a child-daemon, too, but
> let's ignore that.
>
> But inside a container, I don't want the users to be able to write to
> anything in their own container. I do want them to be able to make
> sub-cgroups, but only 5 levels deep. For sub-cgroups, they should be
> able to write to memory.limit_in_bytes, to read but not write
> memory.soft_limit_in_bytes, and not be able to read memory.stat.
>
> To get even fancier, a user should be able to create a sub-cgroup and
> then designate that sub-cgroup as "final" - no further sub-sub-cgroups
> allowed under it. They should also be able to designate that a
> sub-cgroup is "one-way" - once a process enters it, it can not leave.
>
> These are real(ish) examples based on what people want to do today.
> In particular, the last couple are things that we want to do, but
> don't do today.
>
> The particular policy can differ per-container. Production jobs might
> be allowed to create sub-cgroups, but batch jobs are not. Some user
> jobs are designated "trusted" in one facet or another and get more
> (but still not full) access.

Interesting, thanks.

I'll think a bit on how to best address these.

> > At the moment I'm going on the naive belief that proper hierarchy
> > controls will be enforced (eventually) by the kernel - i.e. if
> > a task in cgroup /lxc/c1 is not allowed to mknod /dev/sda1, then it
> > won't be possible for /lxc/c1/lxc/c2 to take that access.
> >
> > The native cgroup manager (the one using cgroupfs) will be checking
> > the credentials of the requesting child manager for access(2) to
> > the cgroup files.
>
> This might be sufficient, or the basis for a sufficient access control
> system for users. The problem comes that we have multiple jobs on a
> single machine running as the same user. We need to ensure that the
> jobs can not modify each other.

Would running them each in user namespaces with different mappings (all
jobs running as uid 1000, but uid 1000 mapped to different host uids
for each job) would be (long-term) feasible?

> > It is a named socket.
>
> So anyone can connect? even with SO_PEERCRED, how do you know which
> branches of the cgroup tree I am allowed to modify if the same user
> owns more than one?

I was assuming that any process requesting management of
/c1/c2/c3 would have to be in one of its ancestor cgroups (i.e. /c1)

So if you have two jobs running as uid 1000, one under /c1 and one
under /c2, and one as uid 1001 running under /c3 (with the uids owning
the cgroups), then the file permissions will prevent 1000 and 1001
from walk over each other, while the cgroup manager will not allow
a process (child manager or otherwise) under /c1 to manage cgroups
under /c2 and vice versa.

> >> Do you have a design spec, or a requirements list, or even a prototype
> >> that we can look at?
> >
> > The readme at https://github.com/hallyn/cgroup-mgr/blob/master/README
> > shows what I have in mind. It (and the sloppy code next to it)
> > represent a few hours' work over the last few days while waiting
> > for compiles and in between emails...
>
> Awesome. Do you mind if we look?

No, but it might not be worth it (other than the readme) :) - so far
it's only served to help me think through what I want and need from
the mgr.

> > But again, it is completely predicated on my goal to have libvirt
> > and lxc (and other cgroup users) be able to use the same library
> > or API to make their requests whether they are on host or in a
> > container, and regardless of the distro they're running under.
>
> I think that is a good goal. We'd like to not be different, if
> possible. Obviously, we can't impose our needs on you if you don't
> want to handle them. It sounds like what you are building is the
> bottom layer in a stack - we (Google) should use that same bottom
> layer. But that can only happen iff you're open to hearing our
> requirements. Otherwise we have to strike out on our own or build
> more layers in-between.

I'm definately open to your requirements - whether providing what
you need for another layer on top, or building it right in.

-serge
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/