Re: [patch V4 11/14] PCI/MSI: Provide a sane mechanism for TPH
From: Bjorn Helgaas
Date: Tue Jun 17 2025 - 19:23:07 EST
On Wed, Mar 19, 2025 at 11:56:57AM +0100, Thomas Gleixner wrote:
> The PCI/TPH driver fiddles with the MSI-X control word of an active
> interrupt completely unserialized against concurrent operations issued
> from the interrupt core. It also brings the PCI/MSI-X internal cached
> control word out of sync.
>
> Provide a function, which has the required serialization and keeps the
> control word cache in sync.
>
> Unfortunately this requires to look up and lock the interrupt descriptor,
> which should be only done in the interrupt core code. But confining this
> particular oddity in the PCI/MSI core is the lesser of all evil. A
> interrupt core implementation would require a larger pile of infrastructure
> and indirections for dubious value.
>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Acked-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> Cc: Wei Huang <wei.huang2@xxxxxxx>
> Cc: linux-pci@xxxxxxxxxxxxxxx
>
>
>
> ---
> drivers/pci/msi/msi.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> drivers/pci/pci.h | 9 +++++++++
> 2 files changed, 56 insertions(+)
>
> --- a/drivers/pci/msi/msi.c
> +++ b/drivers/pci/msi/msi.c
> @@ -910,6 +910,53 @@ void pci_free_msi_irqs(struct pci_dev *d
> }
> }
>
> +#ifdef CONFIG_PCIE_TPH
> +/**
> + * pci_msix_write_tph_tag - Update the TPH tag for a given MSI-X vector
> + * @pdev: The PCIe device to update
> + * @index: The MSI-X index to update
> + * @tag: The tag to write
> + *
> + * Returns: 0 on success, error code on failure
> + */
> +int pci_msix_write_tph_tag(struct pci_dev *pdev, unsigned int index, u16 tag)
> +{
> + struct msi_desc *msi_desc;
> + struct irq_desc *irq_desc;
> + unsigned int virq;
> +
> + if (!pdev->msix_enabled)
> + return -ENXIO;
> +
> + guard(msi_descs_lock)(&pdev->dev);
> + virq = msi_get_virq(&pdev->dev, index);
> + if (!virq)
> + return -ENXIO;
> + /*
> + * This is a horrible hack, but short of implementing a PCI
> + * specific interrupt chip callback and a huge pile of
> + * infrastructure, this is the minor nuissance. It provides the
> + * protection against concurrent operations on this entry and keeps
> + * the control word cache in sync.
> + */
> + irq_desc = irq_to_desc(virq);
> + if (!irq_desc)
> + return -ENXIO;
> +
> + guard(raw_spinlock_irq)(&irq_desc->lock);
> + msi_desc = irq_data_get_msi_desc(&irq_desc->irq_data);
> + if (!msi_desc || msi_desc->pci.msi_attrib.is_virtual)
> + return -ENXIO;
> +
> + msi_desc->pci.msix_ctrl &= ~PCI_MSIX_ENTRY_CTRL_ST;
> + msi_desc->pci.msix_ctrl |= FIELD_PREP(PCI_MSIX_ENTRY_CTRL_ST, tag);
> + pci_msix_write_vector_ctrl(msi_desc, msi_desc->pci.msix_ctrl);
> + /* Flush the write */
> + readl(pci_msix_desc_addr(msi_desc));
> + return 0;
> +}
Looks like this change might add this warning, which I don't claim to
understand:
$ make C=2 drivers/pci/msi/msi.o
drivers/pci/msi/msi.c:928:5: warning: context imbalance in 'pci_msix_write_tph_tag' - wrong count at exit
This appeared in v6.16-rc1 as d5124a9957b2 ("PCI/MSI: Provide a sane
mechanism for TPH")