Re: Crashes with 874bbfe600a6 in 3.18.25

From: Mike Galbraith
Date: Sun Feb 07 2016 - 00:59:57 EST


On Sun, 2016-02-07 at 06:19 +0100, Mike Galbraith wrote:
> On Sat, 2016-02-06 at 11:07 -0200, Henrique de Moraes Holschuh wrote:
> > On Fri, 05 Feb 2016, Tejun Heo wrote:
> > > On Fri, Feb 05, 2016 at 09:59:49PM +0100, Mike Galbraith wrote:
> > > > On Fri, 2016-02-05 at 15:54 -0500, Tejun Heo wrote:
> > > >
> > > > > What are you suggesting?
> > > >
> > > > That 874bbfe6 should die.
> > >
> > > Yeah, it's gonna be killed. The commit is there because the behavior
> > > change broke things. We don't want to guarantee it but have been and
> > > can't change it right away just because we don't like it when things
> > > may break from it. The plan is to implement a debug option to force
> > > workqueue to always execute these work items on a foreign cpu to weed
> > > out breakages.
> >
> > Is there a path to filter down sane behavior (whichever one it might be) to
> > the affected stable/LTS kernels?
>
> What Michal said, replace 874bbfe6 with 176bed1d. Without 22b886dd,
> 874bbfe6 is a landmine, uses add_timer_on() as if it were mod_timer(),
> which it is not, or rather was not until 22b886dd came along, and still
> does not look like the mod_timer() alias that add_timer() is.

BTW, with the 874bbfe6 22b886dd pair, mundane workqueue timers are no
longer deflected to housekeeper CPUs, so NO_HZ_FULL regresses.

-Mike