Re: Crashes with 874bbfe600a6 in 3.18.25

From: Thomas Gleixner
Date: Tue Jan 26 2016 - 04:50:57 EST


On Tue, 26 Jan 2016, Jan Kara wrote:
> On Sat 23-01-16 17:11:54, Thomas Gleixner wrote:
> > On Sat, 23 Jan 2016, Ben Hutchings wrote:
> > > On Fri, 2016-01-22 at 11:09 -0500, Tejun Heo wrote:
> > > > > Looks like it requires more than trivial backport (I think). Tejun?
> > > >
> > > > The timer migration has changed quite a bit.  Given that we've never
> > > > seen vmstat work crashing in 3.18 era, I wonder whether the right
> > > > thing to do here is reverting 874bbfe600a6 from 3.18 stable?
> > >
> > > It's not just 3.18 that has this; 874bbfe600a6 was backported to all
> > > stable branches from 3.10 onward.  Only the 4.2-ckt branch has
> > > 22b886dd10180939.
> >
> > 22b886dd10180939 fixes a bug which was introduced with the timer wheel
> > overhaul in 4.2. So only 4.2/3 should have it backported.
>
> Thanks for explanation. So do I understand right that timers are always run
> on the calling CPU in kernels prior to 4.2 and thus commit 874bbfe600a6 (to
> run timer for delayed work on the calling CPU) doesn't make sense there? If
> that is true than reverting the commit from older stable kernels is
> probably the easiest way to resolve the crashes.

I was merily referring to 22b886dd10180939 which is a bug fix for things we
reworked in the timer wheel core code in 4.2. It's completely unrelated to the
problem at hand.

Non pinned timers can be migrated due to power saving decisions since
2.6.36. What changed over time is how the decision is made, but the general
principle still applies.

The timer code was completely unchanged between 3.18 and 4.0 and even with the
larger overhaul in 4.2 we did not change the migration logic. We merily
changed the internal implementation of the timer wheel.

I have no idea how 874bbfe600a6 can result in crashing on older kernels. Can
you ask the reporter to enable DEBUG_OBJECTS so we might get an idea what goes
wrong with that timer.

Thanks,

tglx