Re: [PATCH [RT] 08/14] add a loop counter based timeout mechanism

From: Sven-Thorsten Dietrich
Date: Sat Feb 23 2008 - 02:37:55 EST

Next message: Ingo Molnar: "Re: Regression [Was: Boot hang with stack protector on x86_64]"
Previous message: Len Brown: "Re: [2.6 patch] drivers/thermal/thermal.c: fix a check-after-use"
In reply to: Peter W. Morreale: "Re: [PATCH [RT] 08/14] add a loop counter based timeout mechanism"
Next in thread: Peter W. Morreale: "Re: [PATCH [RT] 08/14] add a loop counter based timeout mechanism"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, 2008-02-22 at 13:36 -0700, Peter W. Morreale wrote:
> On Fri, 2008-02-22 at 11:55 -0800, Sven-Thorsten Dietrich wrote:
> >
> > In high-contention, short-hold time situations, it may even make sense
> > to have multiple CPUs with multiple waiters spinning, depending on
> > hold-time vs. time to put a waiter to sleep and wake them up.
> >
> > The wake-up side could also walk ahead on the queue, and bring up
> > spinners from sleeping, so that they are all ready to go when the lock
> > flips green for them.
> >
>
> I did try an attempt at this one time. The logic was merely if the
> pending owner was running, wakeup the next waiter. The results were
> terrible for the benchmarks used, as compared to the current
> implementation.

Yup, but you cut the CONTEXT where I said:

"for very large SMP systems"

Specifically, what I mean, is an SMP system, where I have enough CPUs to
do this:

let (t_Tcs) be the time to lock, transition and unlock an un-contended
critical section (i.e. the one that I am the pending waiter for).

let (t_W) be the time to wake up a sleeping task.

and let (t_W > t_Tcs)

Then, "for very large SMP systems"

if

S = (t_W / t_Tcs),

then S designates the number of tasks to transition a critical section
before the first sleeper would wake up.

and the number of CPUs > S.

The time for an arbitrary number of tasks N > S which are all competing
for lock L, to transition a critical section (T_N_cs), approaches:

T_N_cs = (N * t_W)

if you have only 1 task spinning.

but if you can have

N tasks spinning, (T_N_cs) approaches:

T_N_cs = (N * t_Tcs)

and with the premise, that t_W > t_Tcs, you should see a dramatic
throughput improvement when running PREEMPT_RT on VERY LARGE SMP
systems.

I want to disclaim, that the math above is very much simplified, but I
hope its sufficient to demonstrate the concept.

I have to acknowledge Ingo's comments, that this is all suspect until
proven to make a positive difference in "non-marketing" workloads.

I personally *think* we are past that already, and the adaptive concept
can and will be extended and scaled as M-socket and N-core based SMP
proliferates into to larger grid-based systems. But there is plenty more
to do to prove it.

(someone send me a 1024 CPU box and a wind-powered-generator)

Sven

>
> What this meant was that virtually every unlock performed a wakeup, if
> not for the new pending owner, than the next-in-line waiter.
>
> My impression at the time was that the contention for the rq lock is
> significant, regardless of even if the task being woken up was already
> running.
>
> I can generate numbers if that helps.
>
> -PWM
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Ingo Molnar: "Re: Regression [Was: Boot hang with stack protector on x86_64]"
Previous message: Len Brown: "Re: [2.6 patch] drivers/thermal/thermal.c: fix a check-after-use"
In reply to: Peter W. Morreale: "Re: [PATCH [RT] 08/14] add a loop counter based timeout mechanism"
Next in thread: Peter W. Morreale: "Re: [PATCH [RT] 08/14] add a loop counter based timeout mechanism"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]