Re: [RFC PATCH] sched: wakeup buddy

From: Mike Galbraith
Date: Thu Feb 28 2013 - 03:04:25 EST


On Thu, 2013-02-28 at 15:40 +0800, Michael Wang wrote:
> Hi, Mike
>
> Thanks for your reply.
>
> On 02/28/2013 03:18 PM, Mike Galbraith wrote:
> > On Thu, 2013-02-28 at 14:38 +0800, Michael Wang wrote:
> >
> >> + /*
> >> + * current is the only task on rq and it is
> >> + * going to sleep, current cpu will be a nice
> >> + * candidate for p to run on.
> >> + */
> >
> > The sync hint only means it might be going to sleep soon, and even then,
> > there can still be enough execution overlap to be a win to schedule
> > cross core. Sched pipe numbers will always be much prettier if you do
> > wakeup cpu affine, as it's ~100% scheduler and ~100% sync.
>
> Hmm.. so it's the comparison between 'cache benefit - execution overlap'
> and 'latency - execution overlap'?

Yeah. You'll always lose power cross core, and throughput breakeven and
win depends on convertible overlap, and how much L2 miss etc costs. For
sched pipe there is no win, but for other sync hint users there is.

> I could not estimate how many latency will be added to wait for current
> going to sleep (it should be faster than access cold data, isn't it?),
> but I really like the cache benefit, unless sync doesn't means current
> is going to sleep every time, but that's the promise of WF_SYNC, isn't it?

It would be nice if it _were_ a promise, but it is not, it's a hint.

> You may lose
> > a lot on other stuff if you interpret the hint as gospel truth.
>
> Could you please give more details on this point?

tbench, mysql+oltp, on and on use the sync hint, many things jabber on
localhost, use the sync hint, and have been shown in cold hard numbers
to benefit, some things massively from cross core scheduling. You lose
for sure at extreme context rates, but it has to be pretty darn high to
be a guaranteed loser.

That's why select_idle_sibling() is so very damn annoying.
> > IMHO, sched pipe is a "how fat have I become" benchmark, not "how well
> > do I perform". The scheduler performs well when it makes more work
> > happen. Playing ping-pong with yourself is _exercise_, not a job :)
>
> That's right, may be I'm using the wrong description, it's the ops/sec
> which has been doubled, that means 'fat', correct?

In this case, it means you're not running a kernel with nohz on a chain,
running two schedulers is more expensive than running one, and missing
L2 each and every time hurts very badly when the load is ultra skinny.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/