Re: [PATCH] proc: speedup /proc/stat handling

From: KAMEZAWA Hiroyuki
Date: Mon Jan 23 2012 - 20:27:17 EST


On Mon, 23 Jan 2012 14:33:54 +0400
Glauber Costa <glommer@xxxxxxxxxxxxx> wrote:

> On 01/23/2012 02:16 PM, KAMEZAWA Hiroyuki wrote:
> > On Fri, 20 Jan 2012 16:59:24 +0100
> > Eric Dumazet<eric.dumazet@xxxxxxxxx> wrote:
>>
> >> An alternative to 1) would be to remember the largest m->count reached
> >> in show_stat()
> >>
> >
> > nice catch. But how about using usual seq_file rather than single_open() ?
> > I just don't like multi-page buffer for this small file...very much.
> >
> > A rough patch here, maybe optimization will not be enough. (IOW, this may be slow.)
> >
>
> I myself don't like it very much, at least at first sight.
> Even with optimizations applied, I doubt we can make this approach
> faster than what we currently do for /proc/stat.
>
IIUC, most of cost comes from printing " 0".

> Also, the code gets a lot harder to read and grasp. Problem is, unlike
> most of the stuff using seq_file, /proc/stat shows a lot of different
> kinds of information, not a single kind of easily indexable information.
>

I think current one is too simple. But yes, may not be worth to to use usual
seq_file sequence.

I did some optimization around number(). Because my environ is small.
size of /proc/stat is 2780.
[kamezawa@bluextal test]$ wc -c /proc/stat
2780 /proc/stat

Test program is this.
== test program (read 1000 tiems.)==

#!/usr/bin/python

num = 0

with open("/proc/stat") as f:
while num < 1000 :
data = f.read()
f.seek(0, 0)
num = num + 1


== Before patch (3.3-rc1) ==
[kamezawa@bluextal test]$ time ./stat_check.py

real 0m0.142s
user 0m0.022s
sys 0m0.117s

== After patch ==
[root@bluextal test]# time ./stat_check.py

real 0m0.096s
user 0m0.024s
sys 0m0.069s


==
In above, most of improvements comes from replacing seq_printf() with seq_puts().

If the number of cpu increases, the most costly one will be kstat_irqs().

perf record after patch:
19.03% stat_check.py [kernel.kallsyms] [k] memcpy
7.83% stat_check.py [kernel.kallsyms] [k] seq_puts
6.83% stat_check.py [kernel.kallsyms] [k] kstat_irqs
5.75% stat_check.py [kernel.kallsyms] [k] radix_tree_lookup
5.24% stat_check.py libpython2.6.so.1.0 [.] 0x8ccb0
4.35% stat_check.py [kernel.kallsyms] [k] vsnprintf
3.68% stat_check.py [kernel.kallsyms] [k] sub_preempt_count
3.45% stat_check.py [kernel.kallsyms] [k] number

If we can find kstat_irqs()==0 without walking all possible cpus, we can
cut most of costs...

Anyway, this is my final version. I'll go to my usual work ;)

==