The reason why I asked Frederic whether it would make more sense as
part of memcg wasn't about flexibility but mostly about the type of
the resource. I'll continue below.
Agree. Even people aiming for unified hierarchies are okay with an
opt-in/out system, I believe. So the controllers need not to be
active at all times. One way of doing this is what I suggested to
Frederic: If you don't limit, don't account.
I don't agree, it's a valid usecase to monitor a workload without
limiting it in any way. I do it all the time.
AFAICS, this seems to be the most valid use case for different
controllers seeing different part of the hierarchy, even if the
hierarchies aren't completely separate. Accounting and control being
in separate controllers is pretty sucky too as it ends up accounting
things multiple times. Maybe all controllers should learn how to do
accounting w/o applying limits? Not sure yet.
To reraise a point from my other email that was ignored: do users
actually really care about the number of tasks when they want to
prevent forkbombs? If a task would use neither CPU nor memory, you
would not be interested in limiting the number of tasks.
Because the number of tasks is not a resource. CPU and memory are.
So again, if we would include the memory impact of tasks properly
(structures, kernel stack pages) in the kernel memory counters which
we allow to limit, shouldn't this solve our problem?
The task counter is trying to control the *number* of tasks, which is
purely memory overhead.
Translating #tasks into the actual amount of
memory isn't too trivial tho - the task stack isn't the only
allocation and the numbers should somehow make sense to the userland
in consistent way. Also, I'm not sure whether this particular limit
should live in its silo or should be summed up together as part of
kmem (kmem itself is in its own silo after all apart from user memory,
right?).
So, if those can be settled, I think protecting against fork
bombs could fit memcg better in the sense that the whole thing makes
more sense.