Re: [PATCH 1/9] Remove parent field in cpuacct cgroup

From: Glauber Costa
Date: Mon Sep 19 2011 - 12:31:33 EST


On 09/19/2011 01:19 PM, Peter Zijlstra wrote:
On Mon, 2011-09-19 at 13:09 -0300, Glauber Costa wrote:
On 09/19/2011 01:03 PM, Peter Zijlstra wrote:
On Wed, 2011-09-14 at 17:04 -0300, Glauber Costa wrote:
+ for (; ca; ca = parent_ca(ca)) {

It might be good to check that the loop condition and null condition in
the parent_ca() function get folded. Otherwise there's a double branch
in that loop.

Note that this function is one of the reasons I dislike cpuacct, it adds
a second cgroup hierarchy traversal to every context switch.

Well, it is not that hard to optimize this.

Those values are always updated, but they don't really need to, unless
they are read.

So what we can do, is introduce a marker in the cgroup, representing the
last read value. Parent is untouched. We then update parent when 1)
reading this value, 2) cgroup destroy, 3) cpu hotplug. (humm, and maybe
we don't even need to do it in cpu hotplug, since the per-cpu variables
will still be accessible... )

How about it ?

Updating that value would involve iterating all tasks in the entire
cgroup subtree nested at whatever cgroup you're wanting to read.

No, it would not. Because nothing is stored in the task, all is stored in the cgroup. So it is O(h(n)), where n is the number of cgroups and h(n) the height of the cgroups tree.

The delayed update would be an entire subtree walk, that can be quite
expensive.
But the subtrees are small, because we are talking about the cgroup subtree, wich can grow quite a lot in breadth, but rarely in depth.

Who wants these numbers and what for and at what frequency?
Does that really make sense?

Whoever wants /proc/stat numbers. Once, or maybe twice a sec would be the normal interval here for most use cases, I guess (top inside a container, for instance).

Even people doing much more frequent updates here, would not come as close as doing it every tick, therefore making this option cheaper than transversing the tree at each tick.

Btw, this works for cpuacct. For cpuusage, I am not sure this optimization is a valid one. Since this value is at least intended to provide a basis for cpu capping in the near future (Well, it is not there, but I think it is), it is expected to be used much more frequently by the kernel itself.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/