Re: [RFD] Merge task counter into memcg

From: Glauber Costa
Date: Thu Apr 12 2012 - 13:15:46 EST



The reason why I asked Frederic whether it would make more sense as
part of memcg wasn't about flexibility but mostly about the type of
the resource. I'll continue below.

Agree. Even people aiming for unified hierarchies are okay with an
opt-in/out system, I believe. So the controllers need not to be
active at all times. One way of doing this is what I suggested to
Frederic: If you don't limit, don't account.

I don't agree, it's a valid usecase to monitor a workload without
limiting it in any way. I do it all the time.

AFAICS, this seems to be the most valid use case for different
controllers seeing different part of the hierarchy, even if the
hierarchies aren't completely separate. Accounting and control being
in separate controllers is pretty sucky too as it ends up accounting
things multiple times. Maybe all controllers should learn how to do
accounting w/o applying limits? Not sure yet.

Well...

* I don't know how blkcgrp applies limits
* the cpu cgroup, is limiting by nature, in the sense that it divides shares in proportion to the number of cgroups in a hierarchy
* memcg has a RESOURCE_MAX default limit that is bigger than anything you can possibly count.

So one of the problems, is that "limiting" may mean different thing to each controller.

I am mostly talking about memory cgroup here. And there. "Accounting without limiting" can trivially be done by setting limit to RESOURCE_MAX-delta. This won't work when we start having machines with 2^64 physical memory, but I guess we have some time until it happens.

The way I see, it's just a technicality over a way to runtime disable the accounting of a resource without filling the hierarchy with flags.


To reraise a point from my other email that was ignored: do users
actually really care about the number of tasks when they want to
prevent forkbombs? If a task would use neither CPU nor memory, you
would not be interested in limiting the number of tasks.

Because the number of tasks is not a resource. CPU and memory are.

So again, if we would include the memory impact of tasks properly
(structures, kernel stack pages) in the kernel memory counters which
we allow to limit, shouldn't this solve our problem?

The task counter is trying to control the *number* of tasks, which is
purely memory overhead.

No, it is not. As we talk, it is becoming increasingly clear that given the use case, the correct term is "translating task *back* into the actual amount of memory".

Translating #tasks into the actual amount of
memory isn't too trivial tho - the task stack isn't the only
allocation and the numbers should somehow make sense to the userland
in consistent way. Also, I'm not sure whether this particular limit
should live in its silo or should be summed up together as part of
kmem (kmem itself is in its own silo after all apart from user memory,
right?).


It is accounted together, but limited separately. Setting memory.kmem.limit > memory.limit is a trivial way to say "Don't limit kmem". (and yet account it)

Same thing would go for a stack limit (Well, assuming it won't be merged into kmem itself as well)

So, if those can be settled, I think protecting against fork
bombs could fit memcg better in the sense that the whole thing makes
more sense.

I myself will advise against merging anything not byte-based to memcg.
"task counter" is not byte-based.
"fork bomb preventer" might be.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/