Re: [PATCH] irq: add support for warning on long-running IRQ handlers

From: Thomas Gleixner
Date: Mon Jun 30 2025 - 11:42:27 EST


On Mon, Jun 30 2025 at 14:46, Wladislav Wiebe wrote:

The subsystem prefix is 'genirq:' See

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#patch-subject

> Introduce a new option CONFIG_IRQ_LATENCY_WARN that enables warnings when
> IRQ handlers take an unusually long time to execute.
>
> When triggered, the warning includes the CPU, IRQ number, handler address,
> name, and execution duration, for example:
>
> [CPU0] latency on IRQ[787:bad_irq_handler+0x1/0x34 [bad_irq]], took: 5 jiffies (~50 ms)
>
> To keep runtime overhead minimal, this implementation uses a jiffies-based
> timing mechanism. While coarse, it is sufficient to detect problematic IRQs.

Define sufficient. That really depends on your use case. For a real-time
system a hard interrupt handler running longer than a few microseconds
can be problematic.

So instead of adding some single purpose mechanism, can we please add
something flexible which can be used for a wide range of scenarios.

> +#ifdef CONFIG_IRQ_LATENCY_WARN
> +static inline void warn_on_irq_latency(struct irqaction *action, unsigned int irq,
> + unsigned long jiffies_start)

latency is the wrong term here. This is about the runtime, duration of
the handler.

> +{
> + unsigned long delta = jiffies - jiffies_start;
> +
> + /*
> + * Warn about long IRQ handler latency only if jiffies are reliable.

What means jiffies are reliable?

> + * The reporting condition hits only when there are at least two CPUs
> + * with active ticks.
> + * Jiffies updates are stalled on this CPU until MAX_STALLED_JIFFIES
> + * reaches and a force update happens on another CPU with active ticks.
> + */
> + if (unlikely(delta >= (MAX_STALLED_JIFFIES + CONFIG_IRQ_LATENCY_WARN_THRESHOLD))) {
> + pr_warn_ratelimited("[CPU%d] latency on IRQ[%u:%pS], took: %lu jiffies (~%u ms)\n",
> + smp_processor_id(), irq, action->handler,
> + delta, jiffies_to_msecs(delta));
> + }
> +}
> +#else
> +static inline void warn_on_irq_latency(struct irqaction *action, unsigned int irq,
> + unsigned long jiffies_start) { }
> +#endif

I'm absolutely not fond of this #ifdeffery and yet more Kconfig
knobs. Something like this should just work:

DEFINE_STATIC_KEY_FALSE(handler_duration_check);

if (static_branch_unlikely(&handler_duration_check))
ts_start = local_clock();
res = action->handler(irq, action->dev_id);
if (static_branch_unlikely(&handler_duration_check))
check_handler_duration(ts_start);

Then have a command-line option which allows the user to set a warning
threshold > 0 in microseconds. When the option is evaluated the
threshold is stored in a __ro_after_init variable and the static branch
is enabled.

Hmm?

Thanks,

tglx