Re: Crashes with 874bbfe600a6 in 3.18.25

From: Mike Galbraith
Date: Thu Feb 04 2016 - 06:07:36 EST


On Thu, 2016-02-04 at 11:46 +0100, Thomas Gleixner wrote:
> On Thu, 4 Feb 2016, Mike Galbraith wrote:

> > I'm also wondering why 22b886dd only applies to kernels >= 4.2.
> >
> >
> > Regardless of the previous CPU a timer was on, add_timer_on()
> > currently simply sets timer->flags to the new CPU. As the caller must
> > be seeing the timer as idle, this is locally fine, but the timer
> > leaving the old base while unlocked can lead to race conditions as
> > follows.
> >
> > Let's say timer was on cpu 0.
> >
> > cpu 0 cpu 1
> > -----------------------------------------------------------------------------
> > del_timer(timer) succeeds
> > del_timer(timer)
> > lock_timer_base(timer) locks cpu_0_base
> > add_timer_on(timer, 1)
> > spin_lock(&cpu_1_base->lock)
> > timer->flags set to cpu_1_base
> > operates on @timer operates on @timer
> >
> >
> > What's the difference between...
> > timer->flags = (timer->flags & ~TIMER_BASEMASK) | cpu;
> > and...
> > timer_set_base(timer, base);
> >
> > ...that makes that fix unneeded prior to 4.2? We take the same locks
> > in < 4.2 kernels, so seemingly both will diddle concurrently above.
>
> Indeed, you are right.

Whew, thanks for confirming, looking for what the hell I was missing
wasn't going well at all, ate most of my day.

> The same can happen on pre 4.2, just the fix does not apply as we changed the
> internals how the base is managed in the timer itself. Backport below.

Exactly what I did locally.

-Mike