On Wed, 2011-09-14 at 17:04 -0300, Glauber Costa wrote:Hey Peter,[[ For those getting this twice: I sent it previously to containers
ml, but I guess it was out. Sending now to a broader audience anyway ]]
Hi,
This patchset is a simple initial proposal for a per-cgroup/container
display of /proc/stat. The display method is based on Daniel's idea of
exposing a file that can be bind mounted (Daniel, is that more or less
what you had in mind?)
To grab the stats themselves, I am (ab)using cpuacct cgroup. percpu counters
are dropped in favor of normal percpu pointers, so we can easily track
per-cpu quantities.
In case you guys like this idea, my TODO list would include the removal
of the show stat code in fs/proc/stat.c altogether, and the displaying
of some fields I haven't touched yet.
Also, to demonstrate one of the potential ideas for such method, I
implemented a feature comonly found in hypervisors - steal time - on top
of it. I arguee that containers can/should also display steal time when
available. Turns out that due to the fact that we run on the same kernel,
steal time is quite easy to implement once we have per-container tick
accounting in place.
Please let me know what you guys think
Glauber Costa (9):
Remove parent field in cpuacct cgroup
Make cpuacct fields per cpu variables
Include nice values in cpuacct
Include irq and softirq fields in cpuacct
Include guest fields in cpuacct
Include idle and iowait fields in cpuacct
Create cpuacct.proc.stat file
per-cgroup boot time
Report steal time for cgroup
kernel/sched.c | 265 +++++++++++++++++++++++++++++++++++++++++++++++++-------
1 files changed, 234 insertions(+), 31 deletions(-)
I hate it already.. it just smells of more senseless accounting
overhead.
Guys we should seriously trim back a lot of that code, not grow ever
more and more. The sad fact is that if you build a kernel with
cpu-cgroup support the context switch cost is more than double that of a
kernel without, and then you haven't even started creating cgroups yet.
Also, how doesn't all this duplicate part of cpuacct-cgroup?
/me won't actually look at the patches for a little while longer.