Re: [RFC][PATCH 17/18] sched: Move the second half of ttwu() tothe remote cpu

From: Peter Zijlstra
Date: Tue Jan 18 2011 - 11:38:01 EST


On Fri, 2011-01-07 at 16:22 +0100, Oleg Nesterov wrote:

> Why sched_fork() does set_task_cpu() ? Just curious, it seems
> that wake_up_new_task() does all we need.

The only reason I can come up with is to properly initialize the
data-structures before make the thing visible, by the time
wake_up_new_task() comes along, its already fully visible.

> ttwu_queue_remote() does "struct task_struct *next = NULL".
> Probably "next = rq->wake_list" makes more sens. Otherwise the
> first cmpxchg() always fails if rq->wake_list != NULL.

Indeed, I think Yong mentioned the same a while back.. done.

> Doesn't __migrate_task() need pi_lock? Consider:
>
> 1. A task T runs on CPU_0, it does set_current_state(TASK_INTERRUBTIBLE)
>
> 2. some CPU does set_cpus_allowed_ptr(T, new_mask), new_mask doesn't
> include CPU_0.
>
> T is running, cpumask_any_and() picks CPU_1, set_cpus_allowed_ptr()
> drops pi_lock and rq->lock before stop_one_cpu().
>
> 3. T calls schedule() and becomes deactivated.
>
> 4. CPU_2 does try_to_wake_up(T, TASK_INTERRUPTIBLE), takes pi_lock
> and sees on_rq == F.
>
> 5. set_cpus_allowed_ptr() resumes and calls stop_one_cpu(cpu => 1).
>
> 6. cpu_stopper_thread() runs on CPU_1 and calls ____migrate_task().
> It locks CPU_0 and CPU_1 rq's and checks task_cpu() == src_cpu.
>
> 7. CPU_2 calls select_task_rq(), it returns (to simplify) 2.
>
> Now try_to_wake_up() does set_task_cpu(T, 2), and calls
> ttwu_queue()->ttwu_do_activate()->activate_task().
>
> 8. __migrate_task() on CPU_1 sees p->on_rq and starts the
> deactivate/activate dance racing with ttwu_do_activate()
> on CPU_2.

Drad, yes I think you're right, now you've got me worried about the
other migration paths too.. however did you come up with that
scenario? :-)

A simple fix would be to keep ->pi_lock locked over the call to
stop_one_cpu() from set_cpus_allowed_ptr().

I think the sched_fair.c load-balance code paths are ok because we only
find a task to migrate after we've obtained both runqueue locks, so even
if we migrate current, it cannot schedule (step 3).

I'm not at all sure about the sched_rt load-balance paths, will need to
twist my head around that..


> And a final question. This is really, really minor, but
> activate_task/deactivate_task are not symmetric, the former
> always sets p->on_rq. Looks correct, but imho a bit confusing and
> can complicate the understanding. Since p->on_rq is cleared
> explicitly by schedule(), perhaps it can be set explicitly to
> in try_to_wake_up_*. Or, perhaps, activate/deactivate can check
> ENQUEUE_WAKEUP/DEQUEUE_SLEEP and set/clear p->on_rq. Once again,
> this is purely cosmetic issue.

Right, only because I didn't want to add conditionals and there's two
ENQUEUE_WAKEUP sites and didn't want to replicate the assignment. I'll
fix it up.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/