Re: MSI irqchip configured as IRQCHIP_ONESHOT_SAFE causes spurious IRQs

From: Ramon Fried
Date: Thu Jan 16 2020 - 02:58:44 EST


On Thu, Jan 16, 2020 at 3:39 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> Ramon,
>
> Ramon Fried <rfried.dev@xxxxxxxxx> writes:
> > On Wed, Jan 15, 2020 at 12:54 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >> Ramon Fried <rfried.dev@xxxxxxxxx> writes:
> >> Due to the semantics of MSI this is perfectly fine and aside of your
> >> problem this has worked perfectly fine so far and it's an actual
> >> performance win because it avoid fiddling with the MSI mask which is
> >> slow.
> >>
> > fiddling with MSI masks is a configuration space write, which is
> > non-posted, so it does come with a price.
> > The question is if a test was ever conducted to see the it's better
> > than spurious IRQ's.
>
> The point is that there are no spurious interrupts in the sane cases and
> the tests we did showed a real performance improvements in high
> frequency interrupt situations due to avoiding the config space access.
>
> Please stop claiming that this spurious interrupt problem is there by
> design. It's not. Read the MSI spec.
>
> Also boot your laptop/workstation with 'threadirqs' on the kernel
> command line and check how many spurious interrupts come in. On a test
> machine which has that command line parameter set I see exactly ONE with
> an uptime of several days and heavy MSI interrupt activity. The ONE is
> even there without 'threadirqs' on the command line, so I really can't
> be bothered to analyze that.
>
> >> You still have not told which driver/hardware is affected by this. Can
> >> you please provide that information so we can finally look at the actual
> >> hardware/driver combo?
> >>
> > Sure,
> > I'm writing an MSI IRQ controller, it's basically a MIPS GIC interrupt
> > line which several MSI are multiplexed on it.
>
> I assume you write the driver, not the VHDL for the actual hardware,
> right? If so, you still did not tell which hardware that is and where we
> can find information about it.
There's no official information I can share but I can explain how it works:
Basically, 32 MSI vectors are represented by a single GIC irq.
There's a status registers which every bit correspond to an MSI vector, and
individual MSI needs to be acked on that registers. in any case where
there's asserted bit
the GIC IRQ level is high.

>
> I further assume that 'multiplexed' means that the hardware is something
> like an MSI receiver on the CPU/chipset which handles multiple MSI
> messages and forwards them to a single shared interrupt line on the MIPS
> GIC. Right?
Yes.
>
> Can you please provide a pointer to the hardware documentation?
There's no official documentation for that.
>
> > It's configured with handle_level_irq() as the GIC is level IRQ.
>
> Which is completely bonkers. MSI has edge semantics and sharing an
> interrupt line for edge type interrupts is broken by design, unless the
> hardware which handles the incoming MSIs and forwards them to the level
> type interrupt line is designed properly and the driver does the right
> thing.
Yes, the design of the HW is sort of broken. I concur.
>
> > The ack callback acks the GIC irq. the mask/unmask calls
> > pci_msi_mask_irq() / pci_msi_unmask_irq()
>
> What? How is that supposed to work with multiple MSIs?
Acking is per MSI vector as I described above, so it should work.
>
> Either the hardware is a trainwreck or the driver or both.
>
> I can't tell as I can't find my crystal ball. Maybe I should replace it
> with an Mobileye :)
:)
>
> Thanks,
>
> tglx