Re: PCI MSI issue with reinserting a driver

From: John Garry
Date: Tue Feb 02 2021 - 03:40:10 EST


On 01/02/2021 18:50, Marc Zyngier wrote:

Hi Marc,

Just a heads-up, by chance I noticed that I can't re-insert a specific
driver on v5.11-rc6:

[ 64.356023] hisi_dma 0000:7b:00.0: Adding to iommu group 31
[ 64.368627] hisi_dma 0000:7b:00.0: enabling device (0000 -> 0002)
[ 64.384156] hisi_dma 0000:7b:00.0: Failed to allocate MSI vectors!
[ 64.397180] hisi_dma: probe of 0000:7b:00.0 failed with error -28

That's with CONFIG_DEBUG_TEST_DRIVER_REMOVE=y

Bisect tells me that this is the first bad commit:
4615fbc3788d genirq/irqdomain: Don't try to free an interrupt that has
no mapping

The relevant driver code is
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/dma/hisi_dma.c#n547

That driver only allocates 30 MSI, so maybe there's a problem with not
allocating (and freeing) all 32 MSI.
Are they Multi-MSI (and not MSI-X)?

multi-msi


I'll have a bit more of a look tomorrow.
Here's my suspicion: two of the interrupts are mapped in the low-level
domain (the ITS, I'd expect in your case), but they have never been
mapped at the higher level.

On teardown, we only get rid of the 30 that were actually mapped, and
leave the last two dangling in the ITS domain, and thus the ITS device
resources are never freed. On reload, we request another 32
interrupts, which can't be satisfied for this device.

Assuming I got it right, the question is: why weren't these interrupts
mapped in the PCI domain the first place. And if I got it wrong, I'm
even more curious!

Not sure. I also now notice an error for the SAS PCI driver on D06 when nr_cpus < 16, which means number of MSI vectors allocated < 32, so looks the same problem. There we try to allocate 16 + max(nr cpus, 16) MSI.

Anyway, let me have a look today to see what is going wrong.

cheers,
John