[BUG] perf_events: NMI watchdog event cannot be throttled

From: Stephane Eranian
Date: Wed Aug 18 2010 - 16:26:33 EST


Hi,

I ran into some issue with the NMI watchdog not firing in a deadlock
situation. After some debugging I found the source of the problem.

The NMI watchdog is currently subject, like any other events, to interrupt
throttling. The heart of the problem is that if you are deadlocked on a CPU
with interrupts masked, the timer interrupt won't fire, therefore the
hwc->interrupts
field won't be reset. Then, depending on the max sampling rate, you
could eventually
fail the max interrupt rate test in __pfm_overflow_handler() and
perf_events would
throttle, i.e., stop, the NMI watchdog event before the 5s delay to panic.
Thus, you would never get the panic. I ran into this problem myself.

This is a serious issue because perf_events must ensure the watchdog can
always fire, regardless of the interrupt masking situation.

Look like one way of solving the problem would be to mark the NMI watchdog
event as immune to throttling. The event being internal to the kernel we could
trust the event setup from perf_event_create_kernel_counter().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/