Re: v3.13-rc6+ regression (ARM board)

From: John Stultz
Date: Thu Jan 02 2014 - 16:34:28 EST


On 01/02/2014 12:43 PM, Linus Torvalds wrote:
> On Thu, Jan 2, 2014 at 12:30 PM, John Stultz <john.stultz@xxxxxxxxxx> wrote:
>> So something else may be at play. Even with Linus' patch I reproduced a
>> similar hang here.
>>
>> Still chasing it down, but it looks like a seqlock deadlock where we're
>> calling read while holding the lock.
> Hmm. Only with lockdep, right?

Yep.

> Does lockdep perhaps read the scheduler clock? Afaik, we have
> lockstat_clock(), which uses local_clock(), which in turn translates
> to sched_clock_cpu(smp_processor_id())..
>
> So if that code now tries to read the scheduler clock when
> update_sched_clock() is doing a update and has done a
> write_seqcount_begin()...

Sigh. Deadlock by deadlock detection code.

So yea, it looks like this is the case.. though I've not been able to
get a backtrace during the hang to totally validate it (I'm just using
qemu's info registers and looking at the pc and lr).


So I'm guessing we'll just have to disable the lockdep logic here, which
is a little sad, since I'm a little nervous about the generic
sched_clock's locking (ie: works ok for ARM, but its not NMI safe), and
having some better debugging tools there would be helpful.


Anyway, I'll send out a patch to disable the lockdep usage here shortly.

thanks
-john








--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/