Re: [BUG] "sched: Remove rq->lock from the first half of ttwu()"locks up on ARM

From: Peter Zijlstra
Date: Thu May 26 2011 - 06:55:21 EST


On Thu, 2011-05-26 at 15:29 +0800, Yong Zhang wrote:
> > Figuring out why the existing condition failed
>
> Seems 'current' will change before/after switch_to since it's derived from
> sp register.
> So that means if interrupt come before we switch sp, 'p == current' will
> catch it, but if interrupt comes after we switch sp, we will lose a wake up.

Well, loosing a wakeup isn't the problem here (although it would be a
problem), the immediate problem is that we're getting stuck
(life-locked) in that while (p->on_cpu) loop.

But yes, I think that explains it, if the interrupts hits
context_switch() after current was changed but before clearing
p->on_cpu, we would life-lock in interrupt context.

Now we could of course go add in_interrupt() checks there, but that
would make this already fragile path more interesting, so I think I'll
stick with the proposed patch -- again provided it actually works.

Marc, any word on that?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/