Re: rb tree hrtimer lockup bug (found by perf_fuzzer)

From: Thomas Gleixner
Date: Sun Mar 23 2014 - 11:14:37 EST


On Sat, 22 Mar 2014, Thomas Gleixner wrote:

> On Sat, 22 Mar 2014, Thomas Gleixner wrote:
> > On Fri, 21 Mar 2014, Vince Weaver wrote:
> >
> > > On Fri, 21 Mar 2014, Thomas Gleixner wrote:
> > > >
> > > > I'm a complete idiot. I was staring at oaddr and did not notice that
> > > > descr->name is the real culprit. Sorry. Delta patch below.
> > >
> > > OK. The log was much longer this time, attached.
> >
> > Hmmm.
> >
> > [ 2.739858] NULL pointer dereference at (null)
> > [ 2.747390] IP: [< (null)>] (null)
> > [ 2.752970] PGD 0
> > [ 2.755287] Oops: 0010 [#1] SMP
> >
> > So this time the CPU branched to NULL. So let me recap.
> >
> > First you had the explosion in the hrtimer code. After enabling debug
> > stuff it went to the timer_list and now it looks different again.
> >
> > So that looks more like a random memory corruption.
> >
> > Nasty to debug. And of course it does not reproduce here. I'll throw
> > your config at more machines in the hope that something will trigger
> > it.
>
> I've refined the trace_printk stuff in the hope to get a bit more info
> out of it.

You might also try with the trace_printks removed. That wont give us
the history, but maybe then the bug happens again at some decodable
place.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/