Re: [patch] dynticks core: Fix idle time accounting

From: Valdis . Kletnieks
Date: Mon Oct 02 2006 - 16:23:53 EST


On Mon, 02 Oct 2006 20:43:26 +0200, Thomas Gleixner said:
>
> The patch below fixes the accounting weirdness.

I think it's still slightly defective, or at least suffering from a
disjoint between what is going on - the numbers reported in /proc/stats
add up to the total number of timer interrupts, but that's not necessarily
representative of what happened...

% cat /proc/stat;sleep 15;cat /proc/stat
cpu 27634 0 7762 20470 881 331 252 0
cpu0 27634 0 7762 20470 881 331 252 0
intr 812332 631476 2960 0 4 4 12667 3 14 1 1 4 142891 114 0 22193 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 2187603
btime 1159817297
processes 4028
procs_running 1
procs_blocked 0
nohz total I:397276 S:379955 T:1187.393123 A:0.003125 E: 629447
cpu 27753 0 7818 20739 881 332 253 0
cpu0 27753 0 7818 20739 881 332 253 0
intr 819027 636542 2969 0 4 4 12801 3 14 1 1 4 144371 114 0 22199 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 2209881
btime 1159817297
processes 4033
procs_running 1
procs_blocked 0
nohz total I:401991 S:384494 T:1200.732924 A:0.003122 E: 634513


And the deltas between the sums for cpu0 are equal to the difference of
the first intr (where the timer is) - ticking along at about 446/sec over
that 15 second timeframe. And sure enough, the 'user' field is about 1/3 of
the total interrupts.

The breakage is that userspace tools like gkrellm and vmstat and top are quite
happy in saying "oh, we averaged 446 ticks/sec over the last N seconds? That's
odd, but I can deal..." but unfortunately, treating all of them the same
"width" - and the idle ones are probably twice as wide if not wider. I'm not
sure how to fix that.

(The "thought experiment" for this - imagine over a 10 second period, an idle
machine takes 100 short timeslices for a running process, and 100 very long
sleeps 10 times as long as the first 100. What should /proc/stats report at
that point?)

Attachment: pgp00000.pgp
Description: PGP signature