Re: [discussion] simpler load balance in scheduler

From: Alex Shi
Date: Tue Jan 21 2014 - 03:52:24 EST


On 01/20/2014 11:04 AM, Paul E. McKenney wrote:
> On Mon, Jan 06, 2014 at 09:44:36PM +0800, Alex Shi wrote:
>> On 12/18/2013 12:32 AM, Paul E. McKenney wrote:
>>> On Fri, Dec 13, 2013 at 06:09:47PM +0800, Alex Shi wrote:
>
> [ . . . ]
>
>>> 3. Allow the exported values to become inaccurate, and resample
>>> the actual values remotely if extrapolated values indicate
>>> that action is warranted.
>>
>> It is a very heuristic idea! Could you give a bit more hints/clues to
>> get remote cpu load by extrapolated value? I know RCU use this way
>> wonderfully. but still no much idea to get live cpu load...
>
> Well, as long as the CPU continues doing the same thing, for example,
> being idle or running a user-mode task, the extrapolation should be
> exact, right? The load value was X the last time the CPU changed state,
> and T time has passed since then, so you can calculated it exactly.

It's a good idea that I never thought before. Thanks a lot!
>
> The exact method for detecting inaccuracies depends on how and where
> you are calculating the load values. If you are calculating them on
> each state change (as is done for some values for NO_HZ_FULL), then you
> simply need sufficient synchronization for geting a consistent snapshot
> of several values. One easy way to do this is via a per-CPU seqlock.
> The state-change code write-acquires the seqlock, while those doing
> extrapolation read-acquire it and retry if changes occur. This can have
> problems if too many values are required and if changes occur too fast,
> but such problems can be addressed should they occur.

I thought about the seqlock, but it is clearly not scalable.
Anyway, load balance don't be very accurate, so maybe atomic operate for
exported per cpu load in balance is acceptable.
>
> Does that help?

Yes, very helpful! :)
>
> Thanx, Paul
>
>>> There are probably other approaches. I am being quite general here
>>> because I don't have the full picture of the scheduler statistics
>>> in my head. It is likely possible to obtain a much better approach
>>> by considering the scheduler's specifics.
>>>
>>>>> BTW, to reduce unnecessary remote info fetching, we can use current
>>>>> idle_cpus_mask in nohz, we just skip the idle cpu in this cpumask simply.
>>
>> [..]
>>>
>>> Thanx, Paul
>>>
>>>>> 4, From power saving POV, top-down give the whole system cpu topology
>>>>> info directly. So beside the CS reducing, it can reduce the idle cpu
>>>>> interfere by a transition task. and let idle cpu sleep better.
>>
>> --
>> Thanks
>> Alex
>>
>


--
Thanks
Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/