Re: cgroup: status-quo and userland efforts

From: Tim Hockin
Date: Mon Jul 01 2013 - 02:07:07 EST


On Sun, Jun 30, 2013 at 12:39 PM, Lennart Poettering
<lpoetter@xxxxxxxxxx> wrote:
> Heya,
>
>
> On 29.06.2013 05:05, Tim Hockin wrote:
>>
>> Come on, now, Lennart. You put a lot of words in my mouth.
>
>
>>> I for sure am not going to make the PID 1 a client of another daemon.
>>> That's
>>> just wrong. If you have a daemon that is both conceptually the manager of
>>> another service and the client of that other service, then that's bad
>>> design
>>> and you will easily run into deadlocks and such. Just think about it: if
>>> you
>>> have some external daemon for managing cgroups, and you need cgroups for
>>> running external daemons, how are you going to start the external daemon
>>> for
>>> managing cgroups? Sure, you can hack around this, make that daemon
>>> special,
>>> and magic, and stuff -- or you can just not do such nonsense. There's no
>>> reason to repeat the fuckup that cgroup became in kernelspace a second
>>> time,
>>> but this time in userspace, with multiple manager daemons all with
>>> different
>>> and slightly incompatible definitions what a unit to manage actualy is...
>>
>>
>> I forgot about the tautology of systemd. systemd is monolithic.
>
>
> systemd is certainly not monolithic for almost any definition of that term.
> I am not sure where you are taking that from, and I am not sure I want to
> discuss on that level. This just sounds like FUD you picked up somewhere and
> are repeating carelessly...

It does a number of sort-of-related things. Maybe it does them better
by doing them together. I can't say, really. We don't use it at
work, and I am on Ubuntu elsewhere, for now.

>> But that's not my point. It seems pretty easy to make this cgroup
>> management (in "native mode") a library that can have either a thin
>> veneer of a main() function, while also being usable by systemd. The
>> point is to solve all of the problems ONCE. I'm trying to make the
>> case that systemd itself should be focusing on features and policies
>> and awesome APIs.
>
> You know, getting this all right isn't easy. If you want to do things
> properly, then you need to propagate attribute changes between the units you
> manage. You also need something like a scheduler, since a number of
> controllers can only be configured under certain external conditions (for
> example: the blkio or devices controller use major/minor parameters for
> configuring per-device limits. Since major/minor assignments are pretty much
> unpredictable these days -- and users probably want to configure things with
> friendly and stable /dev/disk/by-id/* symlinks anyway -- this requires us to
> wait for devices to show up before we can configure the parameters.) Soo...
> you need a graph of units, where you can propagate things, and schedule
> things based on some execution/event queue. And the propagation and
> scheduling are closely intermingled.

I'm really just talking about the most basic low-level substrate of
writing to cgroupfs. Again, we don't use udev (yet?) so we don't have
these problems. It seems to me that it's possible to formulate a
bottom layer that is usable by both systemd and non-systemd systems.
But, you know, maybe I am wrong and our internal universe is so much
simpler (and behind the times) than the rest of the world that
layering can work for us and not you.

> Now, that's pretty much exactly what systemd actually *is*. It implements a
> graph of units with a scheduler. And if you rip that part out of systemd to
> make this an "easy cgroup management library", then you simply turn what
> systemd is into a library without leaving anything. Which is just bogus.
>
> So no, if you say "seems pretty easy to make this cgroup management a
> library" then well, I have to disagree with you.
>
>
>>> We want to run fewer, simpler things on our systems, we want to reuse as
>>
>>
>> Fewer and simpler are not compatible, unless you are losing
>> functionality. Systemd is fewer, but NOT simpler.
>
>
> Oh, certainly it is. If we'd split up the cgroup fs access into separate
> daemon of some kind, then we'd need some kind of IPC for that, and so you
> have more daemons and you have some complex IPC between the processes. So
> yeah, the systemd approach is certainly both simpler and uses fewer daemons
> then your hypothetical one.

Well, it SOUNDS like Serge is trying to develop this to demonstrate
that a standalone daemon works. That's what I am keen to help with
(or else we have to invent ourselves). I am not really afraid of IPC
or of "more daemons". I much prefer simple agents doing one thing and
interacting with each other in simple ways. But that's me.

>>> much of the code as we can. You don't achieve that by running yet another
>>> daemon that does worse what systemd can anyway do simpler, easier and
>>> better.
>>
>>
>> Considering this is all hypothetical, I find this to be a funny
>> debate. My hypothetical idea is better than your hypothetical idea.
>
>
> Well, systemd is pretty real, and the code to do the unified cgroup
> management within systemd is pretty complete. systemd is certainly not
> hypothetical.

Fair enough - I did not realize you had already done all the work that
Serge is just starting out on.

>>> The least you could grant us is to have a look at the final APIs we will
>>> have to offer before you already imply that systemd cannot be a valid
>>> implementation of any API people could ever agree on.
>>
>>
>> Whoah, don't get defensive. I said nothing of the sort. The fact of
>> the matter is that we do not run systemd, at least in part because of
>> the monolithic nature. That's unlikely to change in this timescale.
>
>
> Oh, my. I am not sure what makes you think it is monolithic.

It is not a replacement for any one thing. It is a replacement for a
handful of things that we are not keen to change all at once. That's
all. I have not personally looked at what subsystems are able to be
compiled-out so we could do an incremental changeover, though, so
maybe it can work in different modes? I don't know. I am not
pursuing this anyway, so I am not the person to convince, regardless.

>> What I said was that it would be a shame if we had to invent our own
>> low-level cgroup daemon just because the "upstream" daemons was too
>> tightly coupled with systemd.
>
>
> I have no interest to reimplement systemd as a library, just to make you
> happy... I am quite happy with what we already have....
>
>
>> This is supposed to be collaborative, not combative.
>
>
> It certainly sounds *very* differently in what you are writing.

Sorry, then. No offense intended. I'm just looking for opportunities
to not-replicate work, if this whole model is going to be thrust upon
me.

Tim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/