Re: [BUG] perf_events: NMI watchdog event cannot be throttled

From: Stephane Eranian
Date: Thu Aug 19 2010 - 07:24:42 EST


Yeah, that should probably fix it. Let me try it out.


On Thu, Aug 19, 2010 at 1:05 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Wed, 2010-08-18 at 22:26 +0200, Stephane Eranian wrote:
>> Hi,
>>
>> I ran into some issue Âwith the NMI watchdog not firing in a deadlock
>> situation. After some debugging I found the source of the problem.
>>
>> The NMI watchdog is currently subject, like any other events, to interrupt
>> throttling. The heart of the problem is that if you are deadlocked on a CPU
>> with interrupts masked, the timer interrupt won't fire, therefore the
>> hwc->interrupts
>> field won't be reset. Then, depending on the max sampling rate, you
>> could eventually
>> fail the max interrupt rate test in __pfm_overflow_handler() and
>> perf_events would
>> throttle, i.e., stop, the NMI watchdog event before the 5s delay to panic.
>> Thus, you would never get the panic. I ran into this problem myself.
>>
>> This is a serious issue because perf_events must ensure the watchdog can
>> always fire, regardless of the interrupt masking situation.
>>
>> Look like one way of solving the problem would be to mark the NMI watchdog
>> event as immune to throttling. The event being internal to the kernel we could
>> trust the event setup from perf_event_create_kernel_counter().
>
> Something like so?
>
> ---
> Âkernel/watchdog.c | Â Â3 +++
> Â1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 613bc1f..e0fe6e4 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -206,6 +206,9 @@ void watchdog_overflow_callback(struct perf_event *event, int nmi,
> Â Â Â Â Â Â Â Â struct perf_sample_data *data,
> Â Â Â Â Â Â Â Â struct pt_regs *regs)
> Â{
> + Â Â Â /* Ensure the watchdog never gets throttled. */
> + Â Â Â event->hw.interrupts = 0;
> +
> Â Â Â Âif (__get_cpu_var(watchdog_nmi_touch) == true) {
> Â Â Â Â Â Â Â Â__get_cpu_var(watchdog_nmi_touch) = false;
> Â Â Â Â Â Â Â Âreturn;
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/