RE: irq_build_affinity_masks() allocates improper affinity if num_possible_cpus() > num_present_cpus()?

From: Dexuan Cui
Date: Tue Oct 06 2020 - 23:08:22 EST


> From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Sent: Tuesday, October 6, 2020 11:58 AM
> > ...
> > I pass through an MSI-X-capable PCI device to the Linux VM (which has
> > only 1 virtual CPU), and the below code does *not* report any error
> > (i.e. pci_alloc_irq_vectors_affinity() returns 2, and request_irq()
> > returns 0), but the code does not work: the second MSI-X interrupt is not
> > happening while the first interrupt does work fine.
> >
> > int nr_irqs = 2;
> > int i, nvec, irq;
> >
> > nvec = pci_alloc_irq_vectors_affinity(pdev, nr_irqs, nr_irqs,
> > PCI_IRQ_MSIX | PCI_IRQ_AFFINITY, NULL);
>
> Why should it return an error?

The above code returns -ENOSPC if num_possible_cpus() is also 1, and
returns 0 if num_possible_cpus() is 128. So it looks the above code is
not using the API correctly, and hence gets undefined results.

> > for (i = 0; i < nvec; i++) {
> > irq = pci_irq_vector(pdev, i);
> > err = request_irq(irq, test_intr, 0, "test_intr", &intr_cxt[i]);
> > }
>
> And why do you expect that the second interrupt works?
>
> This is about managed interrupts and the spreading code has two vectors
> to which it can spread the interrupts. One is assigned to one half of
> the possible CPUs and the other one to the other half. Now you have only
> one CPU online so only the interrupt with has the online CPU in the
> assigned affinity mask is started up.
>
> That's how managed interrupts work. If you don't want managed interrupts
> then don't use them.
>
> Thanks,
>
> tglx

Thanks for the clarification! It looks with PCI_IRQ_AFFINITY the kernel
guarantees that the allocated interrutps are 1:1 bound to CPUs, and the
userspace is unable to change the affinities. This is very useful to support
per-CPU I/O queues.

Thanks,
-- Dexuan