Re: [PATCH] perf, x86: catch spurious interrupts after disablingcounters

From: Don Zickus
Date: Fri Sep 24 2010 - 14:12:31 EST


On Fri, Sep 24, 2010 at 12:03:45PM +0200, Robert Richter wrote:
> On 23.09.10 23:18:34, Don Zickus wrote:
>
> > > I was able to duplicate the problem and can confirm this patch fixes the
> > > issue for me. I tried poking around (similar to things Robert probably
> > > did) and had no luck. Something just doesn't make sense, but I guess for
> > > now this patch is good enough for me. :-)
> >
> > Ah ha! I figured out what the problem was, need to disable the pmu while
> > processing the nmi. :-) Finally something simple in this crazy unknown
> > NMI spree.
> >
> > Oh yeah and trace_printk is now my new favorite tool!
> >
> > From: Don Zickus <dzickus@xxxxxxxxxx>
> > Date: Thu, 23 Sep 2010 22:52:09 -0400
> > Subject: [PATCH] x86, perf: disable pmu from counting when processing irq
> >
> > On certain AMD and Intel machines, the pmu was left enabled
> > while the counters were reset during handling of the NMI.
> > After the counters are reset, code continues to process an
> > overflow. During this time another counter overflow interrupt
> > could happen because the counter is still ticking. This leads to
> > an unknown NMI.
>
> I don't like the approach of disabling all counters in the nmi
> handler. First, it stops counting and thus may falsify
> measurement. Second, it introduces much overhead doing a rd-/wrmsrl()
> for each counter.

Ok. good points. Though we may have to re-visit the perf_event_intel.c
code as my patch was based on 'well they did it there, we can do it here'
approach.

>
> Does your patch below solve something that my patch doesn't?

Well, two things I was trying to solve with your approach is the extra NMI
that is generated, and the fact that you may falsely eat an unknown NMI
(because you don't clear cpuc->running except in the special case).

Yeah, I know the heurestics from your patch a couple of weeks ago will
make the case of falsely eating an unknown NMI next to impossible. But I
was trying to limit the number seen.

For example, the back-to-back nmi problem we dealt with a couple of weeks
ago, only had a 'special case' that had to be dealt with every once in a
while (depending on the level of perf activity). Now I can see unknown
NMIs every time there is a pmu_stop() issued. I was trying to figure out
a way to cut down on that noise.

I came up with a couple of approaches but both involve another rdmsrl in
the handle_irq path. I thought I would toss these ideas out for
conversation.

the first approach gets rid of the extra nmi by waiting until the end to
re-write the counter, but adds a second rdmsrl to resync the period in the
case where pmu_stop is not called.


diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 48c6d8d..1642f48 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1175,11 +1175,22 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
handled++;
data.period = event->hw.last_period;

- if (!x86_perf_event_set_period(event))
- continue;
+ /*
+ * if period is over, process the overflow
+ * before reseting the counter, otherwise
+ * a new overflow could occur before the
+ * event is stopped
+ */
+ if (local64_read(&hwc->period_left) <= 0) {
+ if (perf_event_overflow(event, 1, &data, regs)) {
+ x86_pmu_stop(event, 0);
+ continue;
+ }
+ /* if the overflow doesn't stop the event, resync */
+ x86_perf_event_update(event);
+ }

- if (perf_event_overflow(event, 1, &data, regs))
- x86_pmu_stop(event, 0);
+ x86_perf_event_set_period(event);
}

if (handled)

and the second approach, accepts the fact that we will get another unknown
NMI but check for it first (after stopping) and clear the cpuc->running
bit if we didn't overflow yet.

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 18c8856..bfb05da 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1189,8 +1189,12 @@ static int x86_pmu_handle_irq(struct pt_regs *regs)
if (!x86_perf_event_set_period(event))
continue;

- if (perf_event_overflow(event, 1, &data, regs))
+ if (perf_event_overflow(event, 1, &data, regs)) {
x86_pmu_stop(event, 0);
+ rdmsrl(hwc->event_base + idx, val);
+ if (val & (1ULL << (x86_pmu.cntval_bits - 1)))
+ __clear_bit(idx, cpuc->running);
+ }
}

if (handled)


After dealing with the Intel folks on another thread about how they are
finding more ways to detect and report hardware errors with NMIs, that I was
getting nervous about generating so many false NMIs and accidentally
eating a real one.

Cheers,
Don

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/