RE: [PATCH 14/16] PCI: hv: Switch to msi_create_parent_irq_domain()
From: Michael Kelley
Date: Fri Jul 04 2025 - 00:58:55 EST
From: Nam Cao <namcao@xxxxxxxxxxxxx> Sent: Thursday, July 3, 2025 9:33 PM
>
> On Fri, Jul 04, 2025 at 02:27:01AM +0000, Michael Kelley wrote:
> > I haven't resolved the conflict. As a shortcut for testing I just
> > removed the conflicting patch since it is for a Microsoft custom NIC
> > ("MANA") that's not in the configuration I'm testing with. I'll have to
> > look more closely to figure out the resolution.
> >
> > Separately, this patch (the switch to misc_create_parent_irq_domain)
> > isn't working for Linux VMs on Hyper-V on ARM64. The initial symptom
> > is that interrupts from the NVMe controller aren't getting handled
> > and everything hangs. Here's the dmesg output:
> >
> > [ 84.463419] hv_vmbus: registering driver hv_pci
> > [ 84.463875] hv_pci abee639e-0b9d-49b7-9a07-c54ba8cd5734: PCI VMBus probing: Using version 0x10004
> > [ 84.464518] hv_pci abee639e-0b9d-49b7-9a07-c54ba8cd5734: PCI host bridge to bus 0b9d:00
> > [ 84.464529] pci_bus 0b9d:00: root bus resource [mem 0xfc0000000-0xfc00fffff window]
> > [ 84.464531] pci_bus 0b9d:00: No busn resource found for root bus, will use [bus 00-ff]
> > [ 84.465211] pci 0b9d:00:00.0: [1414:b111] type 00 class 0x010802 PCIe Endpoint
> > [ 84.466657] pci 0b9d:00:00.0: BAR 0 [mem 0xfc0000000-0xfc00fffff 64bit]
> > [ 84.481923] pci_bus 0b9d:00: busn_res: [bus 00-ff] end is updated to 00
> > [ 84.481936] pci 0b9d:00:00.0: BAR 0 [mem 0xfc0000000-0xfc00fffff 64bit]: assigned
> > [ 84.482413] nvme nvme0: pci function 0b9d:00:00.0
> > [ 84.482513] nvme 0b9d:00:00.0: enabling device (0000 -> 0002)
> > [ 84.556871] irq 17, desc: 00000000e8529819, depth: 0, count: 0, unhandled: 0
> > [ 84.556883] ->handle_irq(): 0000000062fa78bc, handle_bad_irq+0x0/0x270
> > [ 84.556892] ->irq_data.chip(): 00000000ba07832f, 0xffff00011469dc30
> > [ 84.556895] ->action(): 0000000069f160b3
> > [ 84.556896] ->action->handler(): 00000000e15d8191, nvme_irq+0x0/0x3e8
> > [ 172.307920] watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [kworker/6:1H:195]
>
> Thanks for the report.
>
> On arm64, this driver relies on the parent irq domain to set handler. So
> the driver must not overwrite it to NULL.
>
> This should cures it:
>
> diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> index 3a24fadddb83..f4a435b0456c 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -577,8 +577,6 @@ static void hv_pci_onchannelcallback(void *context);
>
> #ifdef CONFIG_X86
> #define DELIVERY_MODE APIC_DELIVERY_MODE_FIXED
> -#define FLOW_HANDLER handle_edge_irq
> -#define FLOW_NAME "edge"
>
> static int hv_pci_irqchip_init(void)
> {
> @@ -723,8 +721,6 @@ static void hv_arch_irq_unmask(struct irq_data *data)
> #define HV_PCI_MSI_SPI_START 64
> #define HV_PCI_MSI_SPI_NR (1020 - HV_PCI_MSI_SPI_START)
> #define DELIVERY_MODE 0
> -#define FLOW_HANDLER NULL
> -#define FLOW_NAME NULL
> #define hv_msi_prepare NULL
>
> struct hv_pci_chip_data {
> @@ -2162,8 +2158,9 @@ static int hv_pcie_domain_alloc(struct irq_domain *d,
> unsigned int virq, unsigne
> return ret;
>
> for (int i = 0; i < nr_irqs; i++) {
> - irq_domain_set_info(d, virq + i, 0, &hv_msi_irq_chip, NULL, FLOW_HANDLER, NULL,
> - FLOW_NAME);
> + irq_domain_set_hwirq_and_chip(d, virq + i, 0, &hv_msi_irq_chip, NULL);
> + if (IS_ENABLED(CONFIG_X86))
> + __irq_set_handler(virq + i, handle_edge_irq, 0, "edge");
> }
>
> return 0;
Yes, that fixes the problem. Linux now boots with the PCI NIC VF and two
NVMe controllers being visible and operational. Thanks for the fix! It
would have taken me a while to figure it out.
I want to do some additional testing tomorrow, and look more closely at the
code, but now I have something that works well enough to make further
progress.
Michael