Re: [lkp-robot] [sched/fair] 6d46bd3d97: netperf.Throughput_tps -11.3% regression

From: Mike Galbraith
Date: Fri Sep 15 2017 - 00:07:35 EST


On Thu, 2017-09-14 at 11:56 -0400, Rik van Riel wrote:
>
> On systems with SMT, it may make more sense for
> sync wakeups to look for idle threads of the same
> core, than to have the woken task end up on the
> same thread, and wait for the current task to stop
> running.

Depends.

homer:/root # taskset -c 3 pipe-test
1.412185 usecs/loop -- avg 1.412185 1416.2 KHz
homer:/root # taskset -c 2,3 pipe-test
2.298820 usecs/loop -- avg 2.298820 870.0 KHz
homer:/root # taskset -c 3,7 pipe-test
1.899164 usecs/loop -- avg 1.899164 1053.1 KHz

For pipe-test, having ~zero overlap as well as ~zero footprint, that's
a good choice, but..

homer:/root # taskset -c 3 tbench.sh 1 10 2>&1|grep Throughput
Throughput 844.04 MB/sec 1 clients 1 procs max_latency=0.042 ms
homer:/root # taskset -c 2,3 tbench.sh 1 10 2>&1|grep Throughput
Throughput 713.25 MB/sec 1 clients 1 procs max_latency=0.324 ms
homer:/root # taskset -c 3,7 tbench.sh 1 10 2>&1|grep Throughput
Throughput 512.866 MB/sec 1 clients 1 procs max_latency=0.454 ms

..for tbench, where my crusty ole Q6600 turns in a win by scheduling
the pair on separate L2 sharing cores, for the more modern SMT equipped
i4790, targeting shared L2 is the worst choice.

Bigger issue is that while microbenchmark behavior is consistant,
applications tend to process data and react to it (vs merely batting it
about like playful kittens, cute, but not all that productive), likely
mucking up any heuristic anyone invents with depressing regularity.

-Mike