Re: cgroup: status-quo and userland efforts

From: James Bottomley
Date: Wed Jul 03 2013 - 13:11:53 EST


On Wed, 2013-07-03 at 01:57 +0200, Thomas Gleixner wrote:
> Lennart,
>
> On Sun, 30 Jun 2013, Lennart Poettering wrote:
> > On 29.06.2013 05:05, Tim Hockin wrote:
> > > But that's not my point. It seems pretty easy to make this cgroup
> > > management (in "native mode") a library that can have either a thin
> > > veneer of a main() function, while also being usable by systemd. The
> > > point is to solve all of the problems ONCE. I'm trying to make the
> > > case that systemd itself should be focusing on features and policies
> > > and awesome APIs.
> >
> > You know, getting this all right isn't easy. If you want to do things
> > properly, then you need to propagate attribute changes between the units you
> > manage. You also need something like a scheduler, since a number of
> > controllers can only be configured under certain external conditions (for
> > example: the blkio or devices controller use major/minor parameters for
> > configuring per-device limits. Since major/minor assignments are pretty much
> > unpredictable these days -- and users probably want to configure things with
> > friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us to
> > wait for devices to show up before we can configure the parameters.) Soo...
> > you need a graph of units, where you can propagate things, and schedule things
> > based on some execution/event queue. And the propagation and scheduling are
> > closely intermingled.
>
> you are confusing policy and mechanisms.
>
> The access to cgroupfs is mechanism.
>
> The propagation of changes, the scheduling of cgroupfs access and
> the correlation to external conditions are policy.
>
> What Tim is asking for is to have a common interface, i.e. a library
> which implements the low level access to the cgroupfs mechanism
> without imposing systemd defined policies to it (It might implement a
> set of common useful policies, but that's a different discussion).
>
> That's definitely not an unreasonable request, because he wants to
> implement his own set of policies which are not necessarily the same
> as those which are implemented by systemd.

Could I just add a me too to this from Parallels. We need the ability
to impose our own container policy on the kernel mechanisms.

Perhaps I should step back a bit and say first of all that we all use
the word "container" a lot, but if you analyse what we mean, you'll find
that a Google container is different from a Parallels/OpenVZ container
which is different from an LXC container and so on. How we all build
our containers is a policy we impose on the various cgroup and namespace
mechanisms within the kernel. We've spent a lot of discussion time over
the years making sure that the kernel mechanisms support all of our
different use cases, so I really don't want to see that change in the
name of simplifying the API.

I also don't think any quest for the one true container will be
successful for the simple reason that containers are best when tuned for
the job they're doing. For instance at Parallels we do IaaS containers.
That means we can take a container, boot up any old Linux OS inside it
and give you root on it in exactly the same way as you could for a
virtual machine. Google does something more like application containers
for job control and some network companies do pure namespace containers
without any cgroup controllers at all. There's no one container
description that would fit all use cases.

So where we are is that the current APIs may be messy, but they support
all use cases and all container structure policies. If anyone, systemd
included, wants to do a new API, it must support all use cases as well.
Ideally, it should be agreed to and in the kernel as well rather than
having some userspace filter.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/