Re: Scheduler accounting inflated for io bound processes.

From: Mike Galbraith
Date: Tue Jun 25 2013 - 12:02:19 EST


On Thu, 2013-06-20 at 14:46 -0500, Dave Chiluk wrote:
> Running the below testcase shows each process consuming 41-43% of it's
> respective cpu while per core idle numbers show 63-65%, a disparity of
> roughly 4-8%. Is this a bug, known behaviour, or consequence of the
> process being io bound?

All three I suppose. Idle is indeed inflated when softirq load is
present. Depends on ACCOUNTING config what exact numbers you see.

There are lies, there are damn lies.. and there are statistics.

> 1. run sudo taskset -c 0 netserver
> 2. run taskset -c 1 netperf -H localhost -l 3600 -t TCP_RR & (start
> netperf with priority on cpu1)
> 3. run top, press 1 for multiple CPUs to be separated

CONFIG_TICK_CPU_ACCOUNTING cpu[23] isolated

cgexec -g cpuset:rtcpus netperf.sh 999&sleep 300 && killall -9 top

%Cpu2 : 6.8 us, 42.0 sy, 0.0 ni, 42.0 id, 0.0 wa, 0.0 hi, 9.1 si, 0.0 st
%Cpu3 : 5.6 us, 43.3 sy, 0.0 ni, 40.0 id, 0.0 wa, 0.0 hi, 11.1 si, 0.0 st
^^^^
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
7226 root 20 0 8828 336 192 S 57.6 0.0 2:49.40 3 netserver 100*(2*60+49.4)/300 = 56.4
7225 root 20 0 8824 648 504 R 55.6 0.0 2:46.55 2 netperf 100*(2*60+46.55)/300 = 55.5

Ok, accumulated time ~agrees with %CPU snapshots.

cgexec -g cpuset:rtcpus taskset -c 3 schedctl -I pert 5

(pert is self calibrating tsc tight loop perturbation measurement
proggy, enters kernel once per 5s period for write. It doesn't care
about post period stats processing/output time, but it's running
SCHED_IDLE, gets VERY little CPU when competing, so runs more or less
only when netserver is idle. Plenty good enough proxy for idle.)
...
cgexec -g cpuset:rtcpus netperf.sh 9999
...
pert/s: 81249 >17.94us: 24 min: 0.08 max: 33.89 avg: 8.24 sum/s:669515us overhead:66.95%
pert/s: 81151 >18.43us: 25 min: 0.14 max: 37.53 avg: 8.25 sum/s:669505us overhead:66.95%
^^^^^^^^^^^^^^^^^^^^^^^
pert userspace tsc loop gets ~32% ~= idle upper bound, reported = ~40%,
disparity ~8%.

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
23067 root 20 0 8828 340 196 R 57.5 0.0 0:19.15 3 netserver
23040 root 20 0 8208 396 304 R 42.7 0.0 0:35.61 3 pert
^^^^ ~10% disparity.

perf record -e irq:softirq* -a -C 3 -- sleep 00
perf report --sort=comm

99.80% netserver
0.20% pert

pert does ~zip softirq processing (timer+rcu) and ~zip squat kernel.

Repeat.

cgexec -g cpuset:rtcpus netperf.sh 3600
pert/s: 80860 >474.34us: 0 min: 0.06 max: 35.26 avg: 8.28 sum/s:669197us overhead:66.92%
pert/s: 80897 >429.20us: 0 min: 0.14 max: 37.61 avg: 8.27 sum/s:668673us overhead:66.87%
pert/s: 80800 >388.26us: 0 min: 0.14 max: 31.33 avg: 8.26 sum/s:667277us overhead:66.73%

%Cpu3 : 36.3 us, 51.5 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 12.1 si, 0.0 st
^^^^ ~agrees with pert
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
23569 root 20 0 8828 340 196 R 57.2 0.0 0:21.97 3 netserver
23040 root 20 0 8208 396 304 R 42.9 0.0 6:46.20 3 pert
^^^^ pert is VERY nearly 100% userspace
one of those numbers is a.. statistic
Kills pert...

%Cpu3 : 3.4 us, 42.5 sy, 0.0 ni, 41.4 id, 0.1 wa, 0.0 hi, 12.5 si, 0.0 st
^^^ ~agrees that pert's us claim did go away, but wth is up
with sy, it dropped ~9% after killing ~100% us proggy. nak
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND
23569 root 20 0 8828 340 196 R 56.6 0.0 2:50.80 3 netserver

Yup, adding softirq load turns utilization numbers into.. statistics.
Pure cpu load idle numbers look fine.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/