Re: [PATCH 00/10] cgroups: Task counter subsystem v6

From: Glauber Costa
Date: Fri Nov 04 2011 - 09:18:17 EST


On 11/03/2011 03:56 PM, Paul Menage wrote:
On Thu, Nov 3, 2011 at 10:35 AM, Glauber Costa<glommer@xxxxxxxxxxxxx> wrote:

If multiple subsystems on the same hierarchy each need to
walk up the pointer chain on the same event, then after the first
subsystem has done so the chain will be in cache for any subsequent
walks from other subsystems.

No, it won't. Precisely because different subsystems have completely
independent pointer chains.

Because they're following res_counter parent pointers, etc, rather
than using the single cgroups parent pointer chain?

No. Because:

/sys/fs/cgroup/my_subsys/
/sys/fs/cgroup/my_subsys/foo1
/sys/fs/cgroup/my_subsys/foo2
/sys/fs/cgroup/my_subsys/foo1/bar1

and:

/sys/fs/cgroup/my_subsys2/
/sys/fs/cgroup/my_subsys2/foo1
/sys/fs/cgroup/my_subsys2/foo1/bar1
/sys/fs/cgroup/my_subsys2/foo1/bar2

Are completely independent pointer chains. the only thing they share is the pointer to the root. And that's irrelevant in the pointer dance.
Also note that I used cpu and cpuacct as an example, and they don't use res_counters.

So if that's the problem, rather than artificially constrain
flexibility in order to improve micro-benchmarks, why not come up with
approaches that keep both the flexibility and the performance?

Well, I am not opposed to that even if you happen to agree on what I said above. But in the end of the day, with many cgroups appearing, it
may not be about just micro benchmarks.

It is hard to draw the line, but I believe that avoiding creating new cgroups subsystems when possible plays in our favor.

Specifically for this one, my arguments are:

* cgroups are a task-grouping entity
* therefore, all cgroups already do some task manipulation in attach/dettach
* all cgroups subsystem already can register a fork handler

Adding a fork limit as a cgroup property seems a logical step to me based on that.

If, however, we are really creating this, I think we'd be better of referring to this as a "Task Controller" rather than a "Task Counter".

Then at least in the near future when people start trying to limit other task-related resources, this can serve as a natural placeholder for this. (See the syscall limiting that Lukasz is trying to achieve)


- make res_counter hierarchies be explicitly defined via the cgroup
parent pointers, rather than an parent pointer hidden inside the
res_counter. So the cgroup parent chain traversal would all be along
the common parent pointers (and res_counter would be one pointer
smaller).
>

- allow subsystems to specify that they need a small amount of data
that can be accessed efficiently up the cgroup chain. (Many subsystems
wouldn't need this, and those that do would likely only need it for a
subset of their per-cgroup data). Pack this data into as few
cachelines as possible, allocated as a single lump of memory per
cgroup. Each subsystem would know where in that allocation its private
data lay (it would be the same offset for every cgroup, although
dynamically determined at runtime based on the number of subsystems
mounted on that hierarchy)
I thought about this second one myself.
I am not yet convinced this would be a win, but I believe there are chances.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/