Re: perf: WARNING perfevents: irq loop stuck!

From: Vince Weaver
Date: Fri May 08 2015 - 00:17:01 EST


On Fri, 1 May 2015, Ingo Molnar wrote:

> So 0000fffffffffffe corresponds to 2 events left until overflow,
> right? And on Haswell we don't set x86_pmu.limit_period AFAICS, so we
> allow these super short periods.
>
> Maybe like on Broadwell we need a quirk on Nehalem/Haswell as well,
> one similar to bdw_limit_period()? Something like the patch below?
>
> Totally untested and such. I picked 128 because of Broadwell, but
> lower values might work as well. You could try to increase it to 3 and
> upwards and see which one stops triggering stuck NMI loops?

I spent a lot of time trying to come up with a test case that triggered
this more reliably but failed.

It definitely is an issue with PMC0 being -2 causing the PMC0 bit in the
status register getting stuck and no clearing. Often there is also a PEBS
event active at the same time but that might be coincidence.

With your patch applied I can't trigger the issue. I haven't tried
narrowing down the exact value yet.

Vince
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/