Re: perf: fuzzer triggers NULL pointer derefreence in x86_schedule_events

From: Peter Zijlstra
Date: Thu May 07 2015 - 08:43:21 EST


On Mon, May 04, 2015 at 12:32:56PM -0700, Stephane Eranian wrote:
> On Fri, May 1, 2015 at 5:59 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Thu, Apr 30, 2015 at 03:08:56PM -0400, Vince Weaver wrote:
> > >
> > > So the perf_fuzzer caught this after about a week of fuzzing on a Haswell
> > > machine running a recent git kernel (pre 4.1-rc1 though).
> > >
> > > We've seen this BUG before and various fixes were applied but apparently
> > > it wasn't enough.
> > >
> > > Sadly it doesn't seem to be reproducible.
> > >
> > > validate_group() -> x86_pmu.schedule_events() -> ???? -> variable_test_bit()
> > > (hard to tell which test bit with all the inlining going on).
> >
> > Assuming you build with debug info addr2line -i can help, but I think I
> > found it by comparing the Code section below with my objdump -D output.
> >
> > Its:
> > /* constraint still honored */
> > if (!test_bit(hwc->idx, c->idxmsk))
> > break;
> >
> > Which would seem to suggest c is NULL.
> >
> But then, you'd crash in the previous loop, because after
> get_event_contraint(), you touch
> c->weight.

Indeed so; and we can make an analogous argument for hwc. However:

> I think it is more likely related to the bitmask (idxmsk). But then
> it is always allocated with the constraint even with the HT bug
> workaround. So most, likely the index is bogus and you touch outside
> the idxmsk[] array.

[428232.701319] BUG: unable to handle kernel NULL pointer dereference at (null)

But the thing really tried to touch NULL, not some random address that
faulted.

As always, Vince has found us a good puzzle ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/