Re: [GIT PULL] perf crash fix

From: Peter Zijlstra
Date: Thu Jun 03 2010 - 05:33:28 EST


On Thu, 2010-06-03 at 11:26 +0200, Peter Zijlstra wrote:
> On Thu, 2010-06-03 at 05:13 +0200, Frederic Weisbecker wrote:
> > What happens here is a double pmu->disable() due to a race between
> > two perf_adjust_period().
> >
> > We first overflow a page fault event and then re-adjust the period.
> > When we reset the period_left, we stop the pmu by removing the
> > perf event from the software event hlist. And just before we
> > re-enable it, we are interrupted by a sched tick that also tries to
> > re-adjust the period. There we eventually disable the event a second
> > time, which leads to a double hlist_del_rcu() that ends up
> > dereferencing LIST_POISON2.
> >
> > In fact, the goal of embracing the reset of the period_left with
> > a pmu:stop() and pmu:start() is only relevant to hardware events. We
> > want them to reprogram the next period interrupt.
> >
> > But this is useless for software events. They have their own way to
> > handle the period left, and in a non-racy way. They don't need to
> > be stopped here.
> >
> > So, use a new pair of perf_event_stop/start_hwevent that only stop
> > and restart hardware events in this path.
> >
> > The race won't happen with hardware events as sched ticks can't
> > happen during nmis.
>
> I've queued the below.

Uhhm, the compiler reminded me of trace events..

---
Subject: perf: Fix crash in swevents
From: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
Date: Thu Jun 03 11:21:20 CEST 2010

Frederic reported that because swevents handling doesn't disable IRQs
anymore, we can get a recursion of perf_adjust_period(), once from
overflow handling and once from the tick.

If both call ->disable, we get a double hlist_del_rcu() and trigger
a LIST_POISON2 dereference.

Since we don't actually need to stop/start a swevent to re-programm
the hardware (lack of hardware to program), simply nop out these
callbacks for the swevent pmu.

Reported-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
LKML-Reference: <1275557202.27810.35201.camel@twins>
---
kernel/perf_event.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)

Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -4055,13 +4055,6 @@ static void perf_swevent_overflow(struct
}
}

-static void perf_swevent_unthrottle(struct perf_event *event)
-{
- /*
- * Nothing to do, we already reset hwc->interrupts.
- */
-}
-
static void perf_swevent_add(struct perf_event *event, u64 nr,
int nmi, struct perf_sample_data *data,
struct pt_regs *regs)
@@ -4276,11 +4269,22 @@ static void perf_swevent_disable(struct
hlist_del_rcu(&event->hlist_entry);
}

+static void perf_swevent_void(struct perf_event *event)
+{
+}
+
+static int perf_swevent_int(struct perf_event *event)
+{
+ return 0;
+}
+
static const struct pmu perf_ops_generic = {
.enable = perf_swevent_enable,
.disable = perf_swevent_disable,
+ .start = perf_swevent_int,
+ .stop = perf_swevent_void,
.read = perf_swevent_read,
- .unthrottle = perf_swevent_unthrottle,
+ .unthrottle = perf_swevent_void, /* hwc->interrupts already reset */
};

/*
@@ -4561,8 +4565,10 @@ static int swevent_hlist_get(struct perf
static const struct pmu perf_ops_tracepoint = {
.enable = perf_trace_enable,
.disable = perf_trace_disable,
+ .start = perf_swevent_int,
+ .stop = perf_swevent_void,
.read = perf_swevent_read,
- .unthrottle = perf_swevent_unthrottle,
+ .unthrottle = perf_swevent_void,
};

static int perf_tp_filter_match(struct perf_event *event,

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/