Re: [PATCH] x86/hyperv: Enable 15-bit APIC ID if the hypervisor supports it

From: David Woodhouse
Date: Tue Nov 03 2020 - 03:02:50 EST


On Mon, 2020-11-02 at 17:11 -0800, Dexuan Cui wrote:
> When a Linux VM runs on Hyper-V, if the VM has CPUs with >255 APIC IDs,
> the CPUs can't be the destination of IOAPIC interrupts, because the
> IOAPIC RTE's Dest Field has only 8 bits. Currently the hackery driver
> drivers/iommu/hyperv-iommu.c is used to ensure IOAPIC interrupts are
> only routed to CPUs that don't have >255 APIC IDs. However, there is
> an issue with kdump, because the kdump kernel can run on any CPU, and
> hence IOAPIC interrupts can't work if the kdump kernel run on a CPU
> with a >255 APIC ID.
>
> The kdump issue can be fixed by the Extended Dest ID, which is introduced
> recently by David Woodhouse (for IOAPIC, see the field virt_destid_8_14 in
> struct IO_APIC_route_entry). Of course, the Extended Dest ID needs the
> support of the underlying hypervisor. The latest Hyper-V has added the
> support recently: with this commit, on such a Hyper-V host, Linux VM
> does not use hyperv-iommu.c because hyperv_prepare_irq_remapping()
> returns -ENODEV; instead, Linux kernel's generic support of Extended Dest
> ID from David is used, meaning that Linux VM is able to support up to
> 32K CPUs, and IOAPIC interrupts can be routed to all the CPUs.
>
> On an old Hyper-V host that doesn't support the Extended Dest ID, nothing
> changes with this commit: Linux VM is still able to bring up the CPUs with
> > 255 APIC IDs with the help of hyperv-iommu.c, but IOAPIC interrupts still
>
> can not go to such CPUs, and the kdump kernel still can not work properly
> on such CPUs.
>
> Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx>

Acked-by: David Woodhouse <dwmw@xxxxxxxxxxxx>

> +/*
> + * If ms_hyperv_msi_ext_dest_id() returns true, hyperv_prepare_irq_remapping()
> + * returns -ENODEV and the Hyper-V IOMMU driver is not used; instead, the
> + * generic support of the 15-bit APIC ID is used: see __irq_msi_compose_msg().
> + *
> + * Note: For a VM on Hyper-V, no emulated legacy device supports PCI MSI/MSI-X,
> + * and PCI MSI/MSI-X only come from the assigned physical PCIe device, and the
> + * PCI MSI/MSI-X interrupts are handled by the pci-hyperv driver. Here despite
> + * the word "msi" in the name "msi_ext_dest_id", actually the callback only
> + * affects how IOAPIC interrupts are routed.
> + */

I named it like that on purpose to make the point that the I/OAPIC is
just a device for turning line interrupts into MSIs. Some VMMs, just
like real hardware, really do implement their I/OAPIC emulation that
way. It makes a lot of sense to do so if you support interrupt
remapping.

FWIW I might have phrased your last paragraph in that comment as

Note: for a VM on Hyper-V, the I/OAPIC is the only device which
(logically) generates MSIs directly to the system APIC irq domain.
There is no HPET, and PCI MSI/MSI-X interrupts are remapped by the
pci-hyperv host bridge.

But don't bother to change it; I think I've made my point quite well
enough with https://git.kernel.org/tip/tip/c/5d5a97133 :)

--
dwmw2


Attachment: smime.p7s
Description: S/MIME cryptographic signature