Re: [tip:perf/urgent] perf, x86: Catch spurious interrupts afterdisabling counters

From: Don Zickus
Date: Wed Sep 29 2010 - 16:03:54 EST


On Wed, Sep 29, 2010 at 09:42:26PM +0200, Stephane Eranian wrote:
> On Wed, Sep 29, 2010 at 8:12 PM, Don Zickus <dzickus@xxxxxxxxxx> wrote:
> > Robert,
> >
> > I think you missed Stephane's point.  Say for example, kgdb is being used
> > while we are doing stuff with the perf counter (and say kgdb's handler is
> > a lower priority than perf; which isn't true I know, but let's say):
> >
> Yes, exactly my point. The reality is you cannot afford to have false positive
> because you may starve another subsystem from an important notification.
>
> I think it boils down to whether or not we need an error message (Dazed) in
> case no subsystem claimed the NMI. If you were to just silently consume the
> NMI when no subsystem claims it, then you would not have these issues.
>
> What Don has done is use a heuristic which gets activated when a PMU
> interrupt handler signals that more than one counter have overflowed. His
> claim is that this situation is likely to trigger back-to-back.

Actually its Robert's heuristic. :-)

>
> The reason this heuristic works is because it waits until ALL the subsystems
> have seen the notification before it declares that the NMI was PMU spurious.
> To do that is uses the DIE_NMI_UNKNOWN callchain. Handler on this chain
> get call last, after all subsystems have seen the notification once. I believe
> that is the only way to safely "consume" a "spurious" NMI and avoid
> the 'Dazed' message. Anything else runs the risks of starving the other
> subsystems.

I agree.

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/