RE: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer

From: Zhang, Yanmin
Date: Fri Oct 26 2012 - 07:38:52 EST


>-----Original Message-----
>From: Thomas Gleixner [mailto:tglx@xxxxxxxxxxxxx]
>Sent: Friday, October 26, 2012 5:55 PM
>To: He, Bo
>Cc: linux-kernel@xxxxxxxxxxxxxxx; Peter Zijlstra; Ingo Molnar;
>yanmin_zhang@xxxxxxxxxxxxxxx; Zhang, Yanmin
>Subject: Re: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer
>
>On Fri, 26 Oct 2012, he, bo wrote:
>> From: Yanmin Zhang <yanmin.zhang@xxxxxxxxx>
>>
>> We hit a kernel panic at __run_hrtimer=>BUG_ON(timer->state !=
>HRTIMER_STATE_CALLBACK).
>> <2>[ 10.226053, 3] kernel BUG at
>/home/android/xiaobing/ymz/r4/hardware/intel/linux-2.6/kernel/hrtimer.c:1228
>!
>>
>> Basically, __run_hrtimer has a race with enqueue_hrtimer. When
>> __run_hrtimer calls the timer callback fn, another thread might call
>> enqueue_hrtimer or hrtimer_start to requeue it, and the timer->state
>> is equal to HRTIMER_STATE_CALLBACK|HRTIMER_STATE_ENQUEUED, which
>> causes the BUG_ON(timer->state != HRTIMER_STATE_CALLBACK) checking
>> fails.
>>
>> The patch fixes it by checking only bit HRTIMER_STATE_CALLBACK.
>
>This does not fix it. It makes it worse.
Thanks for your kind comments. We found that and sent V2 at https://lkml.org/lkml/2012/10/26/172.

>
>> Signed-off-by: Yanmin Zhang <yanmin.zhang@xxxxxxxxx>
>> Reviewed-by: He, Bo <bo.he@xxxxxxxxx>
>> ---
>> kernel/hrtimer.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
>> index 6db7a5e..6280184 100644
>> --- a/kernel/hrtimer.c
>> +++ b/kernel/hrtimer.c
>> @@ -1235,7 +1235,7 @@ static void __run_hrtimer(struct hrtimer *timer,
>ktime_t *now)
>> * hrtimer_start_range_ns() or in hrtimer_interrupt()
>> */
>> if (restart != HRTIMER_NORESTART) {
>> - BUG_ON(timer->state != HRTIMER_STATE_CALLBACK);
>> + BUG_ON(!(timer->state & HRTIMER_STATE_CALLBACK));
>> enqueue_hrtimer(timer, base);
>> }
>
>What you are allowing here is enqueueing an already enqueued timer
>again. I don't know why this does not explode elsewhere, but that's
>probably pure luck. It's not allowed to double enqueue a timer.
You are right.

>
>So no, this is not a solution. The problem is not in the core timer
>code, the problem is in the code which uses that timer.
>
>Your code is returning HRTIMER_RESTART from the timer callback and at
>the same time it starts the timer from some other context. That's what
>needs to be fixed.
The timer user should fix it. But could we also change hrtimer to make it more stable?
At least, instead of panic, could we print some information and go ahead to let kernel
continue?

Thanks,
Yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/