Re: [PATCH] x86/hotplug: Silence APIC only after all irq's are migrated

From: Evan Green
Date: Mon Aug 17 2020 - 14:13:26 EST


Hi Ashok,
Thank you, Srikanth, and Sukumar for some very impressive debugging here.

On Fri, Aug 14, 2020 at 2:38 PM Ashok Raj <ashok.raj@xxxxxxxxx> wrote:
>
> When offlining CPU's, fixup_irqs() migrates all interrupts away from the
> outgoing CPU to an online CPU. Its always possible the device sent an
> interrupt to the previous CPU destination. Pending interrupt bit in IRR in
> lapic identifies such interrupts. apic_soft_disable() will not capture any
> new interrupts in IRR. This causes interrupts from device to be lost during
> cpu offline. The issue was found when explicitly setting MSI affinity to a
> CPU and immediately offlining it. It was simple to recreate with a USB
> ethernet device and doing I/O to it while the CPU is offlined. Lost
> interrupts happen even when Interrupt Remapping is enabled.
>
> Current code does apic_soft_disable() before migrating interrupts.
>
> native_cpu_disable()
> {
> ...
> apic_soft_disable();
> cpu_disable_common();
> --> fixup_irqs(); // Too late to capture anything in IRR.
> }
>
> Just fliping the above call sequence seems to hit the IRR checks
> and the lost interrupt is fixed for both legacy MSI and when
> interrupt remapping is enabled.
>
>
> Fixes: 60dcaad5736f ("x86/hotplug: Silence APIC and NMI when CPU is dead")
> Link: https://lore.kernel.org/lkml/875zdarr4h.fsf@xxxxxxxxxxxxxxxxxxxxxxx/
> Signed-off-by: Ashok Raj <ashok.raj@xxxxxxxxx>
>
> To: linux-kernel@xxxxxxxxxxxxxxx
> To: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Sukumar Ghorai <sukumar.ghorai@xxxxxxxxx>
> Cc: Srikanth Nandamuri <srikanth.nandamuri@xxxxxxxxx>
> Cc: Evan Green <evgreen@xxxxxxxxxxxx>
> Cc: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx>
> Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx
> ---
> arch/x86/kernel/smpboot.c | 11 +++++++++--
> 1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index ffbd9a3d78d8..278cc9f92f2f 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1603,13 +1603,20 @@ int native_cpu_disable(void)
> if (ret)
> return ret;
>
> + cpu_disable_common();
> /*
> * Disable the local APIC. Otherwise IPI broadcasts will reach
> * it. It still responds normally to INIT, NMI, SMI, and SIPI
> - * messages.

I'm slightly unclear about whether interrupts are disabled at the core
by this point or not. I followed native_cpu_disable() up to
__cpu_disable(), up to take_cpu_down(). This is passed into a call to
stop_machine_cpuslocked(), where interrupts get disabled at the core.
So unless there's another path, it seems like interrupts are always
disabled at the core by this point.

If interrupts are always disabled, then the comment above is a little
obsolete, since we're not expecting to receive broadcast IPIs from
here on out anyway. We could clean up that comment in this change.

If there is a path to here where interrupts are still enabled at the
core, then we'd need to watch out, because this change now allows
broadcast IPIs to come in during cpu_disable_common(). That could be
bad. But I think that's not this case, so this should be ok.

> + * messages. Its important to do apic_soft_disable() after
> + * fixup_irqs(), because fixup_irqs() called from cpu_disable_common()
> + * depends on IRR being set. After apic_soft_disable() CPU preserves
> + * currently set IRR/ISR but new interrupts will not set IRR.
> + * This causes interrupts sent to outgoing cpu before completion
> + * of irq migration to be lost. Check SDM Vol 3 "10.4.7.2 Local
> + * APIC State after It Has been Software Disabled" section for more
> + * details.
> */
> apic_soft_disable();
> - cpu_disable_common();
>
> return 0;
> }
> --
> 2.13.6
>