Re: System lockup with 2.6.26.8-rt16 on ARM9 [Solved]

From: Remy Bohmer
Date: Sun Aug 30 2009 - 05:35:11 EST


Hi Daniel,


2009/8/29 Daniel Walker <dwalker@xxxxxxxxxx>:
> On Sat, 2009-08-29 at 09:47 +0200, Remy Bohmer wrote:
>
>> Well, we found the root cause of this problem.
>> It turned out to be caused by sched_clock() that made disjunct time jumps.
>> This caused this check to become true in kernel/sched_rt.c:370:
>>          if (rt_rq->rt_time > runtime) {
>>                 rt_rq->rt_throttled = 1;
>>                 if (rt_rq_throttled(rt_rq)) {
>>                         sched_rt_rq_dequeue(rt_rq);
>>                         return 1;
>>                 }
>>         }
>>
>> The end results is that all realtime tasks got throttled for a long
>> time, and that time got extended every time sched_clock() made such a
>> jump. I would never have expected the scheduler would show this kind
>> of behaviour while CONFIG_RT_GROUP_SCHED is _not_ set...
>>
>> The root-cause of the sched_clock being faulty was a synchronisation
>> issue between 2 clock domains. The CPU clock and the clock domain of
>> the peripheral (GPT) on which the sched_clock() implementation was
>> based. The GPT made jumps backwards which triggered a false wraparound
>> detection in the conversion of 32->64 bit timestamps, causing the time
>> to jump about 356 seconds in the future...
>>
>
> Can you tell us more about what type of board this was? I've never heard
> of a ARM board having an unstable clocksource before ..

It was a Freescale iMX25 based board... (hmm, looking at it, it was a
driver build by Montavista that configured the GPT as clocksource, so
it might be interesting info for you too, notice that the EPIT turns
out to be much more stable in this processor. I also never seen this
problem on iMX35 based boards for which the same driver was used.)

Remy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/