Re: [PROBLEM] Frequently get "irq 31: nobody cared" when passing through 2x GPUs that share same pci switch via vfio

From: Alex Williamson
Date: Mon Nov 29 2021 - 12:58:39 EST


On Wed, 24 Nov 2021 18:52:16 +1300
Matthew Ruffell <matthew.ruffell@xxxxxxxxxxxxx> wrote:

> Hi Alex,
>
> I have forward ported your patch to 5.16-rc2 to account for the vfio module
> refactor that happened recently. Attached below.
>
> Have you had an opportunity to research if it is possible to conditionalise
> clearing DisINTx by looking at the interrupt status and seeing if there is a
> pending interrupt but no handler set?

Sorry, I've not had any time to continue looking at this. When I last
left it I had found that interrupt bit in the status register was not
set prior to clearing INTxDisable in the command register, but the
status register was immediately set upon clearing INTxDisable. That
suggests we could generalize re-masking INTx since we know there's not
a handler for it at this point, but it's not clear how this state gets
reported and cleared. More generally, should the interrupt code leave
INTx unmasked for any case where there's no handler. I'm not sure.

> We are testing a 5.16-rc2 kernel with the patch applied on Nathan's server
> currently, and we are also trying out the pci=clearmsi command line parameter
> that was discussed on linux-pci a few years ago in [1][2][3][4] along with
> setting snd-hda-intel.enable_msi=1 to see if it helps the crashkernel not get
> stuck copying IR tables.
>
> [1] https://marc.info/?l=linux-pci&m=153988799707413
> [2] https://lore.kernel.org/linux-pci/20181018183721.27467-1-gpiccoli@xxxxxxxxxxxxx/
> [3] https://lore.kernel.org/linux-pci/20181018183721.27467-2-gpiccoli@xxxxxxxxxxxxx/
> [4] https://lore.kernel.org/linux-pci/20181018183721.27467-3-gpiccoli@xxxxxxxxxxxxx/
>
> I will let you know how we get on.

Ok. I've not had any luck reproducing audio INTx issues, any trying to
test it has led me on several tangent bug hunts :-\ Thanks,

Alex