Re: [patch] SMP races in the timer code, timer-fix-2.6.0-test7-A0

From: Ingo Molnar
Date: Sat Oct 11 2003 - 11:15:05 EST



On Sat, 11 Oct 2003, Manfred Spraul wrote:

> What about moving the "timer running" information into the timer_list,
> instead of keeping it in the base? For example base=0 means neither
> running nor pending. base=1 means running, but not pending, and pointers
> mean pending on the given base.
>
> This would allow an atomic test without the brute force locking.

it's not so simple. Firstly, it would burden some of the other timer
codepaths with extra logic. (mod/add/del_timer) Secondly, the use of
timer->base is closely controlled, and it's not that simple to clear the
value of '1' from timer->base after the timer has run. [this could race
with any other CPU.]

it would be much cleaner to add another timer->running field, especially
since this would be the 8th word-sized field in struct timer_list, making
it a nice round structure size.

btw., there's a third type of timer race we have. If a timer function is
delayed by more than 1 timer tick [which could happen under eg. UML], then
it's possible for the timer function to run on another CPU in parallel to
the already executing timer function.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/