Re: [PATCH] sched: change pulling RT task to be pulling thehighest-prio run-queue first

From: Hillf Danton
Date: Fri Jun 03 2011 - 11:11:38 EST


On Tue, May 31, 2011 at 11:00 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> On Sat, 2011-05-28 at 22:34 +0800, Hillf Danton wrote:
>> When pulling, RT tasks are pulled from one overloaded run-queue after another,
>> which is changed to be pulling tasks from the highest-prio run-queue first.
>
> First off, a change like this requires rational. Preferably, in the
> showing of benchmarks, and test cases that demonstrate the problems of
> the current scheduler and explains to us that these changes improve the
> situation.
>
> There is no rational nor any benchmarks that explain why this is better
> than the current method.
>

Hi Steven

Thanks for your review, which shows the shortage of the patch, test case.


>>
>> A new function, cpupri_find_prio(), is added to easy pulling in prio sequence.
>>
>> Signed-off-by: Hillf Danton <dhillf@xxxxxxxxx>
>> ---
>>
>> --- tip-git/kernel/sched_rt.c Sun May 22 20:12:01 2011
>> +++ sched_rt.c    ÂSat May 28 21:24:13 2011
>> @@ -1434,18 +1434,33 @@ static void push_rt_tasks(struct rq *rq)
>> Â Â Â Â Â Â Â ;
>> Â}
>>
>> +static DEFINE_PER_CPU(cpumask_var_t, high_cpu_mask);
>> +
>> Âstatic int pull_rt_task(struct rq *this_rq)
>> Â{
>> Â Â Â int this_cpu = this_rq->cpu, ret = 0, cpu;
>> Â Â Â struct task_struct *p;
>> Â Â Â struct rq *src_rq;
>> + Â Â struct cpumask *high_mask = __get_cpu_var(high_cpu_mask);
>> + Â Â int prio = 0;
>>
>> Â Â Â if (likely(!rt_overloaded(this_rq)))
>> Â Â Â Â Â Â Â return 0;
>> +loop:
>> + Â Â if (! (prio < this_rq->rt.highest_prio.curr))
>> + Â Â Â Â Â Â return ret;
>> +
>> + Â Â if (! cpupri_find_prio(&this_rq->rd->cpupri, prio,
>> + Â Â Â Â Â Â Â Â Â Â Â Â Â Â this_rq->rd->rto_mask, high_mask)) {
>> + Â Â Â Â Â Â prio++;
>> + Â Â Â Â Â Â goto loop;
>> + Â Â }
>
> This loop looks to be expensive in the hot path.
>

You are right, the introduced overhead in worse cases is
this_rq->rt.highest_prio.curr times bit-test like

if (cp->pri_active[task_prio / BITS_PER_LONG] &
(1UL << ((BITS_PER_LONG - 1) - (task_prio % BITS_PER_LONG)))) {

which I think slowdowns the hot patch a lot:/

> Note, in practice, not many RT tasks are running at the same time. If
> this is not the case, then please explain what situation has multiple RT
> tasks contending for more than one CPU where RT tasks are forced to
> migrate continuously, and this patch fixes the situation.
>

The situation is hard to be constructed, I guess it is only captured by
rt_overloaded()


> I understand that the current code looks a bit expensive, as it loops
> through the CPUs that are overloaded, and pulls over the RT tasks
> waiting to run that are of higher priority than the one currently on
> this task. If it picks wrong, it could potentially pull over more than
> one task.
>
> But in practice (and I've traced this a while back), it seldom ever
> happens.
>
> But if you see that this code is hitting the slow path constantly, and
> your code shows better performance, and you can demonstrate this via a
> benchmark that I could use to reproduce, then I will consider taking
> these changes.
>

Since you already traced, the hitting could not happen, I believe.

thanks
Hillf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/