Re: [discussion] simpler load balance in scheduler

From: Paul E. McKenney
Date: Sun Jan 19 2014 - 22:05:36 EST


On Mon, Jan 06, 2014 at 09:44:36PM +0800, Alex Shi wrote:
> On 12/18/2013 12:32 AM, Paul E. McKenney wrote:
> > On Fri, Dec 13, 2013 at 06:09:47PM +0800, Alex Shi wrote:

[ . . . ]

> > 3. Allow the exported values to become inaccurate, and resample
> > the actual values remotely if extrapolated values indicate
> > that action is warranted.
>
> It is a very heuristic idea! Could you give a bit more hints/clues to
> get remote cpu load by extrapolated value? I know RCU use this way
> wonderfully. but still no much idea to get live cpu load...

Well, as long as the CPU continues doing the same thing, for example,
being idle or running a user-mode task, the extrapolation should be
exact, right? The load value was X the last time the CPU changed state,
and T time has passed since then, so you can calculated it exactly.

The exact method for detecting inaccuracies depends on how and where
you are calculating the load values. If you are calculating them on
each state change (as is done for some values for NO_HZ_FULL), then you
simply need sufficient synchronization for geting a consistent snapshot
of several values. One easy way to do this is via a per-CPU seqlock.
The state-change code write-acquires the seqlock, while those doing
extrapolation read-acquire it and retry if changes occur. This can have
problems if too many values are required and if changes occur too fast,
but such problems can be addressed should they occur.

Does that help?

Thanx, Paul

> > There are probably other approaches. I am being quite general here
> > because I don't have the full picture of the scheduler statistics
> > in my head. It is likely possible to obtain a much better approach
> > by considering the scheduler's specifics.
> >
> >>> BTW, to reduce unnecessary remote info fetching, we can use current
> >>> idle_cpus_mask in nohz, we just skip the idle cpu in this cpumask simply.
>
> [..]
> >
> > Thanx, Paul
> >
> >>> 4, From power saving POV, top-down give the whole system cpu topology
> >>> info directly. So beside the CS reducing, it can reduce the idle cpu
> >>> interfere by a transition task. and let idle cpu sleep better.
>
> --
> Thanks
> Alex
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/