Re: [RFC][PATCH 14/18] sched: Remove rq->lock from the first halfof ttwu()

From: Peter Zijlstra
Date: Thu Feb 03 2011 - 12:16:14 EST


On Fri, 2011-01-28 at 17:05 -0800, Frank Rowand wrote:
>
> The deadlock can occur if __ARCH_WANT_UNLOCKED_CTXSW and
> __ARCH_WANT_INTERRUPTS_ON_CTXSW are defined.
>
> A task sets p->state = TASK_UNINTERRUPTIBLE, then calls schedule().
>
> schedule()
> prev->on_rq = 0
> context_switch()
> prepare_task_switch()
> prepare_lock_switch()
> raw_spin_unlock_irq(&rq->lock)
>
> At this point, a pending interrupt (on this same cpu) is handled.
> The interrupt handling results in a call to try_to_wake_up() on the
> current process. The try_to_wake_up() gets into:
>
> while (p->on_cpu)
> cpu_relax();
>
> and spins forever. This is because "prev->on_cpu = 0" slightly
> after this point at:
>
> finish_task_switch()
> finish_lock_switch()
> prev->on_cpu = 0

Right, very good spot!

>
> One possible fix would be to get rid of __ARCH_WANT_INTERRUPTS_ON_CTXSW.
> I don't suspect the reaction to that suggestion will be very positive...

:-), afaik some architectures requires this, ie. removing this would
require dropping whole architectures.

> Another fix might be:
>
> while (p->on_cpu) {
> if (p == current)
> goto out_activate;
> cpu_relax();
> }
>
> Then add back in the out_activate label.
>
> I don't know if the second fix is good -- I haven't thought out how
> it impacts the later patches in the series.

Right, I've done something similar to this, simply short-circuit the cpu
selection to force it to activate the task on the local cpu.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/