Re: [PATCH 1/3] sched: remove select_idle_core() for scalability

From: Subhra Mazumdar
Date: Wed May 30 2018 - 18:06:47 EST




On 05/29/2018 02:36 PM, Peter Zijlstra wrote:
On Wed, May 02, 2018 at 02:58:42PM -0700, Subhra Mazumdar wrote:
I re-ran the test after fixing that bug but still get similar regressions
for hackbench
Hackbench process on 2 socket, 44 core and 88 threads Intel x86 machine
(lower is better):
groups baseline %stdev patch %stdev
1ÂÂÂÂÂÂ 0.5742ÂÂÂÂÂÂÂÂ 21.13ÂÂ 0.5131 (10.64%) 4.11
2ÂÂÂÂÂÂ 0.5776ÂÂÂÂÂÂÂÂ 7.87ÂÂÂ 0.5387 (6.73%) 2.39
4ÂÂÂÂÂÂ 0.9578ÂÂÂÂÂÂÂÂ 1.12ÂÂÂ 1.0549 (-10.14%) 0.85
8ÂÂÂÂÂÂ 1.7018ÂÂÂÂÂÂÂÂ 1.35ÂÂÂ 1.8516 (-8.8%) 1.56
16ÂÂÂÂÂ 2.9955ÂÂÂÂÂÂÂÂ 1.36ÂÂÂ 3.2466 (-8.38%) 0.42
32ÂÂÂÂÂ 5.4354ÂÂÂÂÂÂÂÂ 0.59ÂÂÂ 5.7738 (-6.23%) 0.38
On my IVB-EP (2 socket, 10 core/socket, 2 threads/core):

bench:

perf stat --null --repeat 10 -- perf bench sched messaging -g $i -t -l 10000 2>&1 | grep "seconds time elapsed"

config + results:

ORIG (SIS_PROP, shift=9)

1: 0.557325175 seconds time elapsed ( +- 0.83% )
2: 0.620646551 seconds time elapsed ( +- 1.46% )
5: 2.313514786 seconds time elapsed ( +- 2.11% )
10: 3.796233615 seconds time elapsed ( +- 1.57% )
20: 6.319403172 seconds time elapsed ( +- 1.61% )
40: 9.313219134 seconds time elapsed ( +- 1.03% )

PROP+AGE+ONCE shift=0

1: 0.559497993 seconds time elapsed ( +- 0.55% )
2: 0.631549599 seconds time elapsed ( +- 1.73% )
5: 2.195464815 seconds time elapsed ( +- 1.77% )
10: 3.703455811 seconds time elapsed ( +- 1.30% )
20: 6.440869566 seconds time elapsed ( +- 1.23% )
40: 9.537849253 seconds time elapsed ( +- 2.00% )

FOLD+AGE+ONCE+PONIES shift=0

1: 0.558893325 seconds time elapsed ( +- 0.98% )
2: 0.617426276 seconds time elapsed ( +- 1.07% )
5: 2.342727231 seconds time elapsed ( +- 1.34% )
10: 3.850449091 seconds time elapsed ( +- 1.07% )
20: 6.622412262 seconds time elapsed ( +- 0.85% )
40: 9.487138039 seconds time elapsed ( +- 2.88% )

FOLD+AGE+ONCE+PONIES+PONIES2 shift=0

10: 3.695294317 seconds time elapsed ( +- 1.21% )


Which seems to not hurt anymore.. can you confirm?

Also, I didn't run anything other than hackbench on it so far.

(sorry, the code is a right mess, it's what I ended up with after a day
of poking with no cleanups)

I tested with FOLD+AGE+ONCE+PONIES+PONIES2 shift=0 vs baseline but see some
regression for hackbench and uperf:

hackbenchÂÂÂÂÂÂ BLÂÂÂÂÂ stdev%Â testÂÂÂ stdev% %gain
1(40 tasks)ÂÂÂÂ 0.5816Â 8.94ÂÂÂ 0.5607Â 2.89 3.593535
2(80 tasks)ÂÂÂÂ 0.6428Â 10.64ÂÂ 0.5984Â 3.38 6.907280
4(160 tasks)ÂÂÂ 1.0152Â 1.99ÂÂÂ 1.0036Â 2.03 1.142631
8(320 tasks)ÂÂÂ 1.8128Â 1.40ÂÂÂ 1.7931Â 0.97 1.086716
16(640 tasks)ÂÂ 3.1666Â 0.80ÂÂÂ 3.2332Â 0.48 -2.103207
32(1280 tasks)Â 5.6084Â 0.83ÂÂÂ 5.8489Â 0.56 -4.288210

UperfÂÂÂÂÂÂÂÂ ÂÂ BLÂÂÂÂÂ stdev%Â testÂÂÂ stdev% %gain
8 threadsÂÂÂÂÂÂ 45.36ÂÂ 0.43ÂÂÂ 45.16ÂÂ 0.49 -0.433536
16 threadsÂÂÂÂÂ 87.81ÂÂ 0.82ÂÂÂ 88.6ÂÂÂ 0.38 0.899669
32 threadsÂÂÂÂÂ 151.18Â 0.01ÂÂÂ 149.98Â 0.04 -0.795925
48 threadsÂÂÂÂÂ 190.19Â 0.21ÂÂÂ 184.77Â 0.23 -2.849681
64 threadsÂÂÂÂÂ 190.42Â 0.35ÂÂÂ 183.78Â 0.08 -3.485217
128 threadsÂÂÂÂ 323.85Â 0.27ÂÂÂ 266.32Â 0.68 -17.766089

sysbenchÂÂÂÂÂÂÂ BLÂÂÂÂÂÂÂÂÂÂÂÂÂ stdev%Â testÂÂÂÂ stdev% %gain
8 threadsÂÂÂÂÂÂ 2095.44ÂÂÂÂÂÂÂÂ 1.82ÂÂÂ 2102.63Â 0.29 0.343006
16 threadsÂÂÂÂÂ 4218.44ÂÂÂÂÂÂÂÂ 0.06ÂÂÂ 4179.82Â 0.49 -0.915413
32 threadsÂÂÂÂÂ 7531.36ÂÂÂÂÂÂÂÂ 0.48ÂÂÂ 7744.72Â 0.13 2.832912
48 threadsÂÂÂÂÂ 10206.42ÂÂÂÂÂÂÂ 0.20ÂÂÂ 10144.65 0.19 -0.605163
64 threadsÂÂÂÂÂ 12053.72ÂÂÂÂÂÂÂ 0.09ÂÂÂ 11784.38 0.32 -2.234547
128 threadsÂÂÂÂ 14810.33ÂÂÂÂÂÂÂ 0.04ÂÂÂ 14741.78 0.16 -0.462867

I have a patch which is much smaller but seems to work well so far for both
x86 and SPARC across benchmarks I have run so far. It keeps the idle cpu
search between 1 core and 2 core amount of cpus and also puts a new
sched feature of doing idle core search or not. It can be on by default but
for workloads (like Oracle DB on x86) we can turn it off. I plan to send
that after some more testing.