Re: [patch v4 0/18] sched: simplified fork, release load avg andpower awareness scheduling

From: Mike Galbraith
Date: Mon Jan 28 2013 - 01:42:12 EST


On Mon, 2013-01-28 at 07:15 +0100, Mike Galbraith wrote:
> On Mon, 2013-01-28 at 13:51 +0800, Alex Shi wrote:
> > On 01/28/2013 01:17 PM, Mike Galbraith wrote:
> > > On Sun, 2013-01-27 at 16:51 +0100, Mike Galbraith wrote:
> > >> On Sun, 2013-01-27 at 21:25 +0800, Alex Shi wrote:
> > >>> On 01/27/2013 06:35 PM, Borislav Petkov wrote:
> > >>>> On Sun, Jan 27, 2013 at 05:36:25AM +0100, Mike Galbraith wrote:
> > >>>>> With aim7 compute on 4 node 40 core box, I see stable throughput
> > >>>>> improvement at tasks = nr_cores and below w. balance and powersaving.
> > >> ...
> > >>>> Ok, this is sick. How is balance and powersaving better than perf? Both
> > >>>> have much more jobs per minute than perf; is that because we do pack
> > >>>> much more tasks per cpu with balance and powersaving?
> > >>>
> > >>> Maybe it is due to the lazy balancing on balance/powersaving. You can
> > >>> check the CS times in /proc/pid/status.
> > >>
> > >> Well, it's not wakeup path, limiting entry frequency per waker did zip
> > >> squat nada to any policy throughput.
> > >
> > > monteverdi:/abuild/mike/:[0]# echo powersaving > /sys/devices/system/cpu/sched_policy/current_sched_policy
> > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > > 043321 00058616
> > > 043313 00058616
> > > 043318 00058968
> > > 043317 00058968
> > > 043316 00059184
> > > 043319 00059192
> > > 043320 00059048
> > > 043314 00059048
> > > 043312 00058176
> > > 043315 00058184
> > > monteverdi:/abuild/mike/:[0]# echo balance > /sys/devices/system/cpu/sched_policy/current_sched_policy
> > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > > 043337 00053448
> > > 043333 00053456
> > > 043338 00052992
> > > 043331 00053448
> > > 043332 00053488
> > > 043335 00053496
> > > 043334 00053480
> > > 043329 00053288
> > > 043336 00053464
> > > 043330 00053496
> > > monteverdi:/abuild/mike/:[0]# echo performance > /sys/devices/system/cpu/sched_policy/current_sched_policy
> > > monteverdi:/abuild/mike/:[0]# massive_intr 10 60
> > > 043348 00052488
> > > 043344 00052488
> > > 043349 00052744
> > > 043343 00052504
> > > 043347 00052504
> > > 043352 00052888
> > > 043345 00052504
> > > 043351 00052496
> > > 043346 00052496
> > > 043350 00052304
> > > monteverdi:/abuild/mike/:[0]#
> >
> > similar with aim7 results. Thanks, Mike!
> >
> > Wold you like to collect vmstat info in background?
> > >
> > > Zzzt. Wish I could turn turbo thingy off.
> >
> > Do you mean the turbo mode of cpu frequency? I remember some of machine
> > can disable it in BIOS.
>
> Yeah, I can do that in my local x3550 box. I can't fiddle with BIOS
> settings on the remote NUMA box.
>
> This can't be anything but turbo gizmo mucking up the numbers I think,
> not that the numbers are invalid or anything, better numbers are better
> numbers no matter where/how they come about ;-)
>
> The massive_intr load is dirt simple sleep/spin with bean counting. It
> sleeps 1ms spins 8ms. Change that to sleep 8ms, grind away for 1ms...
>
> monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
> 045150 00006484
> 045157 00006427
> 045156 00006401
> 045152 00006428
> 045155 00006372
> 045154 00006370
> 045158 00006453
> 045149 00006372
> 045151 00006371
> 045153 00006371
> monteverdi:/abuild/mike/:[0]# echo balance > /sys/devices/system/cpu/sched_policy/current_sched_policy
> monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
> 045170 00006380
> 045172 00006374
> 045169 00006376
> 045175 00006376
> 045171 00006334
> 045176 00006380
> 045168 00006374
> 045174 00006334
> 045177 00006375
> 045173 00006376
> monteverdi:/abuild/mike/:[0]# echo performance > /sys/devices/system/cpu/sched_policy/current_sched_policy
> monteverdi:/abuild/mike/:[0]# ./massive_intr 10 60
> 045198 00006408
> 045191 00006408
> 045197 00006408
> 045192 00006411
> 045194 00006409
> 045196 00006409
> 045195 00006336
> 045189 00006336
> 045193 00006411
> 045190 00006410

Back to original 1ms sleep, 8ms work, turning NUMA box into a single
node 10 core box with numactl.

monteverdi:/abuild/mike/:[0]# echo powersaving > /sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
045286 00043872
045289 00043464
045284 00043488
045287 00043440
045283 00043416
045281 00044456
045285 00043456
045288 00044312
045280 00043048
045282 00043240
monteverdi:/abuild/mike/:[0]# echo balance > /sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
045300 00052536
045307 00052472
045304 00052536
045299 00052536
045305 00052520
045306 00052528
045302 00052528
045303 00052528
045308 00052512
045301 00052520
monteverdi:/abuild/mike/:[0]# echo performance > /sys/devices/system/cpu/sched_policy/current_sched_policy
monteverdi:/abuild/mike/:[0]# numactl --cpunodebind=0 massive_intr 10 60
045339 00052600
045340 00052608
045338 00052600
045337 00052608
045343 00052600
045341 00052600
045336 00052608
045335 00052616
045334 00052576
045342 00052600

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/