Re: [External] [patch 0/4] genirq: Prevent migration live lock in handle_edge_irq()

From: Liangyan
Date: Mon Jul 21 2025 - 11:05:52 EST




On 2025/7/19 02:54, Thomas Gleixner wrote:
> Yicon reported and Liangyan debugged a live lock in handle_edge_irq()
> related to interrupt migration.
>
> If the interrupt affinity is moved to a new target CPU and the interrupt is
> currently handled on the previous target CPU for edge type interrupts the
> handler might get stuck on the previous target:
>
> CPU 0 (previous target) CPU 1 (new target)
>
> handle_edge_irq()
> repeat:
> handle_event() handle_edge_irq()
> if (INPROGESS) {
> set(PENDING);
> mask();
> return;
> }
> if (PENDING) {
> clear(PENDING);
> unmask();
> goto repeat;
> }
>
> The migration in software never completes and CPU0 continues to handle the
> pending events forever. This happens when the device raises interrupts with
> a high rate and always before handle_event() completes and before the CPU0
> handler can clear INPROGRESS so that CPU1 sets the PENDING flag over and
> over. This has been observed in virtual machines.
>
> The following series is addressing this by making the new target CPU wait
> for the handler to complete on CPU1 and thereby completing the software
> migration.
>
> A draft combo patch of this has been tested by Liangyan:
>
> https://lore.kernel.org/all/87o6u0rpaa.ffs@tglx
>
> The series splits up the draft patch and has proper changelogs.
>
> Thanks,
>
> tglx
> ---
> chip.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++--------
> internals.h | 6 ++---
> pm.c | 16 +++++---------
> spurious.c | 37 --------------------------------
> 4 files changed, 69 insertions(+), 58 deletions(-)
>
>

Tested-by: Liangyan <liangyan.peng@xxxxxxxxxxxxx>

Regards,
Liangyan