Re: [PATCH] sched: remove the next highest_prio in RT scheduling

From: Hillf Danton
Date: Sat Jun 04 2011 - 00:44:57 EST


On Tue, May 31, 2011 at 10:40 PM, Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> On Sat, 2011-05-28 at 22:25 +0800, Hillf Danton wrote:
>> The next highest_prio element in rt_rq structure, is only used when pulling
>> RT task. As shown by the following snippet (in diff format for clearity),
>>
>> - Â Â Â Â Â Â if (src_rq->rt.highest_prio.next >=
>> + Â Â Â Â Â Â if (src_rq->rt.highest_prio.curr >=
>> Â Â Â Â Â Â Â Â Â this_rq->rt.highest_prio.curr)
>> Â Â Â Â Â Â Â Â Â Â Â continue;
>>
>> the "next" could be replaced with "curr" in the above comparison, since
>> the next is no less than curr by definition.
>
> But it completely misses the point of what we are doing. We will never
> pull a running task, but we can pull a waiting task. That's the point of
> the "next" field. We want to know if a high priority task is waiting to
> run, and if so, then we will pull it over to this CPU because this CPU
> is about to switch to a task with a lower priority. If a waiting task of
> higher priority than this CPU is on another CPU, we want to pull it
> over.
>
> This patch totally breaks this. We don't care about "curr" we care about
> "next".
>
Hi Steven

Both the next and curr reach same result, or incorrect result, before locking
RQ, as the comment says, it is racy. After locking RQ, priority is checked again
to pull the correct tasks with no running task included. The difference between
the next and curr before locking RQ is the core of the patch that incorrect
result could be achieved with no updating the next field.

thanks
Hillf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/