Re: [tip: irq/urgent] PCI/MSI: Mask MSI-X vectors only on success

From: Thomas Gleixner
Date: Wed Apr 27 2022 - 13:36:03 EST


On Wed, Apr 27 2022 at 09:59, Salvatore Bonaccorso wrote:
> On Mon, Mar 14, 2022 at 09:29:53PM +0100, Jeremi Piotrowski wrote:
>
> Does someone knows here on current state of the AWS instances using
> the older Xen version which causes the issues?
>
> AFAIU upstream is not planning to revert 83dbf898a2d4 ("PCI/MSI: Mask
> MSI-X vectors only on success") as it fixed a bug. Now several
> downstream distros do carry a revert of this commit, which I believe
> is an unfortunate situation and wonder if this can be addressed
> upstream to deal with the AWS m4.large instance issues.

The problem is that except for a bisect result we've not seen much
information which might help to debug and analyze the problem.

The guest uses MSI-X on that NIC:

Capabilities: [70] MSI-X: Enable+ Count=3 Masked-

So looking at the commit in question:

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 48e3f4e47b29..6748cf9d7d90 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -722,9 +722,6 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
goto out_disable;
}

- /* Ensure that all table entries are masked. */
- msix_mask_all(base, tsize);
-
ret = msix_setup_entries(dev, base, entries, nvec, affd);
if (ret)
goto out_disable;
@@ -751,6 +748,16 @@ static int msix_capability_init(struct pci_dev *dev, struct msix_entry *entries,
/* Set MSI-X enabled bits and unmask the function */
pci_intx_for_msi(dev, 0);
dev->msix_enabled = 1;
+
+ /*
+ * Ensure that all table entries are masked to prevent
+ * stale entries from firing in a crash kernel.
+ *
+ * Done late to deal with a broken Marvell NVME device
+ * which takes the MSI-X mask bits into account even
+ * when MSI-X is disabled, which prevents MSI delivery.
+ */
+ msix_mask_all(base, tsize);
pci_msix_clear_and_set_ctrl(dev, PCI_MSIX_FLAGS_MASKALL, 0);

IOW, it moves the invocation of msix_mask_all() into the success
path.

As the device uses MSI-X this change does not make any difference from a
hardware perspective simply because _all_ MSI-X interrupts are masked
via the CTRL register already. And it does not matter whether the kernel
masks them individually _before_ or _after_ the allocation. At least not
on real hardware and on a sane emulation.

Now this is XEN and neither real hardware nor sane emulation.

It must be a XEN_HVM guest because XEN_PV guests disable PCI/MSI[-X]
completely which makes the invocation of msix_mask_all() a NOP.

If it's not a XEN_HVM guest, then you can stop reading further as I'm
unable to decode why moving a NOP makes a difference. That belongs in to
the realm of voodoo, but then XEN is voodoo at least for me. :)

XEN guests do not use the common PCI mask/unmask machinery which would
unmask the interrupt on request_irq().

So I assume that the following happens:

Guest Hypervisor

msix_capabilities_init()
....
alloc_irq()
xen_magic() -> alloc_msix_interrupt()
request_irq()

msix_mask_all() -> trap
do_magic()
request_irq()
unmask()
xen_magic()
unmask_evtchn() -> do_more_magic()

So I assume further that msix_mask_all() actually is able to mask the
interrupts in the hardware (ctrl word of the vector table) despite the
hypervisor having allocated and requested the interrupt already.

Nothing in XEN_HVM handles PCI/MSI[-X] mask/unmask in the guest, so I
really have to ask why XEN_HVM does not disable PCI/MSI[-X] masking like
XEN_PV does. I can only assume the answer is voodoo...

Maybe the XEN people have some more enlightened answers to that.

Thanks,

tglx