Re: [RFC PATCH 0/2] sched: simplify the select_task_rq_fair()

From: Michael Wang
Date: Wed Jan 23 2013 - 02:10:15 EST


On 01/23/2013 02:28 PM, Mike Galbraith wrote:
> On Wed, 2013-01-23 at 13:09 +0800, Michael Wang wrote:
>> On 01/23/2013 12:31 PM, Mike Galbraith wrote:
>
>>> Another thing that wants fixing: root can set flags for _existing_
>>> domains any way he likes,
>>
>> Can he? on running time changing the domain flags? I do remember I used to
>> send out some patch to achieve that but was refused since it's dangerous...
>
> Yes, flags can be set any way you like, which works just fine when flags
> are evaluated at runtime.
>
> WRT dangerous: if root says "Let there be stupidity", stupidity should
> appear immediately :)
>
>> but when he invokes godly powers to rebuild
>>> domains, he gets what's hard coded, which is neither clever (godly
>>> wrath;), nor wonderful for godly runtime path decisions.
>>
>> The purpose is to using a map to describe the sd topology of a cpu, it
>> should be rebuild correctly according to the new topology when attaching
>> new domain to a cpu.
>
> Try turning FORK/EXEC/WAKE on/off.
>
> echo [01] > [cpuset]/sched_load_balance will rebuild, but resulting
> domains won't reflect flag your change.

Yeah, I've done some test on it previously, but I failed to enter the
rebuild procedure, need more research on it.

>
>> For this case, it's really strange that level 2 was missed in topology,
>> I found that in build_sched_domains(), the level was added one by one,
>> and I don't know why it jumps here...sounds like some BUG to me.
>>
>> Whatever, the sbm should still work properly by designed, even in such
>> strange topology, if it's initialized correctly.
>>
>> And below patch will do help on it, just based on the original patch set.
>>
>> Could you please take a try on it, it's supposed to make the balance path
>> correctly, and please apply below DEBUG patch too, so we could know how it
>> changes, I think this time, we may be able to solve the issue by the right
>> way ;-)
>
> Done, previous changes backed out, new change applied on top of v2 set.
> Full debug output attached.
>
> Domain flags on this box (bogus CPU domain is still patched away).
>
> monteverdi:/abuild/mike/aim7/:[127]# tune-sched-domains
> usage: tune-sched-domains <val>
> {cpu0/domain0:SIBLING} SD flag: 687
> + 1: SD_LOAD_BALANCE: Do load balancing on this domain
> + 2: SD_BALANCE_NEWIDLE: Balance when about to become idle
> + 4: SD_BALANCE_EXEC: Balance on exec
> + 8: SD_BALANCE_FORK: Balance on fork, clone
> - 16: SD_BALANCE_WAKE: Wake to idle CPU on task wakeup
> + 32: SD_WAKE_AFFINE: Wake task to waking CPU
> - 64: SD_PREFER_LOCAL: Prefer to keep tasks local to this domain
> + 128: SD_SHARE_CPUPOWER: Domain members share cpu power
> - 256: SD_POWERSAVINGS_BALANCE: Balance for power savings
> + 512: SD_SHARE_PKG_RESOURCES: Domain members share cpu pkg resources
> -1024: SD_SERIALIZE: Only a single load balancing instance
> -2048: SD_ASYM_PACKING: Place busy groups earlier in the domain
> -4096: SD_PREFER_SIBLING: Prefer to place tasks in a sibling domain
> -8192: SD_PREFER_UTILIZATION: Prefer utilization over SMP nice
> {cpu0/domain1:MC} SD flag: 559
> + 1: SD_LOAD_BALANCE: Do load balancing on this domain
> + 2: SD_BALANCE_NEWIDLE: Balance when about to become idle
> + 4: SD_BALANCE_EXEC: Balance on exec
> + 8: SD_BALANCE_FORK: Balance on fork, clone
> - 16: SD_BALANCE_WAKE: Wake to idle CPU on task wakeup
> + 32: SD_WAKE_AFFINE: Wake task to waking CPU
> - 64: SD_PREFER_LOCAL: Prefer to keep tasks local to this domain
> - 128: SD_SHARE_CPUPOWER: Domain members share cpu power
> - 256: SD_POWERSAVINGS_BALANCE: Balance for power savings
> + 512: SD_SHARE_PKG_RESOURCES: Domain members share cpu pkg resources
> -1024: SD_SERIALIZE: Only a single load balancing instance
> -2048: SD_ASYM_PACKING: Place busy groups earlier in the domain
> -4096: SD_PREFER_SIBLING: Prefer to place tasks in a sibling domain
> -8192: SD_PREFER_UTILIZATION: Prefer utilization over SMP nice
> {cpu0/domain2:NUMA} SD flag: 9263
> + 1: SD_LOAD_BALANCE: Do load balancing on this domain
> + 2: SD_BALANCE_NEWIDLE: Balance when about to become idle
> + 4: SD_BALANCE_EXEC: Balance on exec
> + 8: SD_BALANCE_FORK: Balance on fork, clone
> - 16: SD_BALANCE_WAKE: Wake to idle CPU on task wakeup
> + 32: SD_WAKE_AFFINE: Wake task to waking CPU
> - 64: SD_PREFER_LOCAL: Prefer to keep tasks local to this domain
> - 128: SD_SHARE_CPUPOWER: Domain members share cpu power
> - 256: SD_POWERSAVINGS_BALANCE: Balance for power savings
> - 512: SD_SHARE_PKG_RESOURCES: Domain members share cpu pkg resources
> +1024: SD_SERIALIZE: Only a single load balancing instance
> -2048: SD_ASYM_PACKING: Place busy groups earlier in the domain
> -4096: SD_PREFER_SIBLING: Prefer to place tasks in a sibling domain
> +8192: SD_PREFER_UTILIZATION: Prefer utilization over SMP nice

I will study this BUG candidate later.

>
> Abbreviated test run:
> Tasks jobs/min jti jobs/min/task real cpu
> 640 158044.01 81 246.9438 24.54 577.66 Wed Jan 23 07:14:33 2013
> 1280 50434.33 39 39.4018 153.80 5737.57 Wed Jan 23 07:17:07 2013
> 2560 47214.07 34 18.4430 328.58 12715.56 Wed Jan 23 07:22:36 2013

So still not works... and not going to balance path while waking up will
fix it, looks like that's the only choice if no error on balance path
could be found...benchmark wins again, I'm feeling bad...

I will conclude the info we collected and make a v3 later.

Regards,
Michael Wang

>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/