Re: Problem with perf hardware counters grouping

From: Vince Weaver
Date: Tue Sep 06 2011 - 15:43:24 EST


On Thu, 1 Sep 2011, Peter Zijlstra wrote:
>
> Both those have 4 generic hardware counters, but x86 defaults to
> enabling the NMI watchdog which takes one, leaving you with 3 (try: echo
> 0 > /proc/sys/kernel/nmi_watchdog). If you had looked at your dmesg
> output you'd have found lines like:
>
> NMI watchdog enabled, takes one hw-pmu counter.
>
> The code can only check if the group as a whole could possibly fit on a
> PMU, which is where your failure on >4 comes from.
>
> What happens with your >3 case is that while the group is valid and
> could fit on the PMU, it won't fit at runtime because the NMI watchdog
> is taking one and won't budge (cpu-pinned counter have precedence over
> any other kind), effectively starving your group of pmu runtime.
>
> Also, we should fix that return to say -EINVAL or so.

So any hope of a fix on this?

As mentioned this is a serious problem for PAPI and I am trying to find a
good way to enable a workaround in a way that doesn't punish people who
have the watchdog disabled.

Is there a "stable" API method of determining if the nmi_watchdog is
present and stealing a perf-counter?

If I find a "1" in /proc/sys/kernel/nmi_watchdog can I assume a counter is
being stolen?

Vince

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/