Re: [PATCH] genirq/msi: Check null pointer before copying struct msi_msg

From: Marc Zyngier
Date: Tue Apr 21 2020 - 04:39:36 EST


On Mon, 20 Apr 2020 09:08:27 -0700
Alan Mikhak <alan.mikhak@xxxxxxxxxx> wrote:

Alan,

> On Mon, Apr 20, 2020 at 2:14 AM Marc Zyngier <maz@xxxxxxxxxx> wrote:
> >
> > On 2020-04-18 16:19, Gustavo Pimentel wrote:
> > > Hi Marc and Alan,
> > >
> > >> I'm not convinced by this. If you know that, by construction, these
> > >> interrupts are not associated with an underlying MSI, why calling
> > >> get_cached_msi_msg() the first place?
> > >>
> > >> There seem to be some assumptions in the DW EDMA driver that the
> > >> signaling would be MSI based, so maybe someone from Synopsys
> > >> (Gustavo?)
> > >> could clarify that. From my own perspective, running on an endpoint
> > >> device means that it is *generating* interrupts, and I'm not sure what
> > >> the MSIs represent here.
> > >
> > > Giving a little context to this topic.
> > >
> > > The eDMA IP present on the Synopsys DesignWare PCIe Endpoints can be
> > > configured and triggered *remotely* as well *locally*.
> > > For the sake of simplicity let's assume for now the eDMA was
> > > implemented
> > > on the EP and that is the IP that we want to configure and use.
> > >
> > > When I say *remotely* I mean that this IP can be configurable through
> > > the
> > > RC/CPU side, however, for that, it requires the eDMA registers to be
> > > exposed through a PCIe BAR on the EP. This will allow setting the SAR,
> > > DAR and other settings, also need(s) the interrupt(s) address(es) to be
> > > set as well (MSI or MSI-X only) so that it can signal through PCIe (to
> > > the RC and consecutively the associated EP driver) if the data transfer
> > > has been completed, aborted or if the Linked List consumer algorithm
> > > has
> > > passed in some linked element marked with a watermark.
> > >
> > > It was based on this case that the eDMA driver was exclusively
> > > developed.
> > >
> > > However, Alan, wants to expand a little more this, by being able to use
> > > this driver on the EP side (through
> > > pcitest/pci_endpoint_test/pci_epf_test) so that he can configure this
> > > IP
> > > *locally*.
> > > In fact, when doing this, he doesn't need to configure the interrupt
> > > address (MSI or MSI-X), because this IP provides a local interrupt line
> > > so that be connected to other blocks on the EP side.
> >
> > Right, so this confirms my hunch that the driver is being used in
> > a way that doesn't reflect the expected use case. Rather than
> > papering over the problem by hacking the core code, I'd rather see
> > the eDMA driver be updated to support both host and endpoint cases.
> > This probably boils down to a PCI vs non-PCI set of helpers.
> >
> > Alan, could you confirm whether we got it right?
>
> Thanks Marc and Gustavo. I appreciate all your comments and feedback.
>
> You both got it right. As Gustavo mentioned, I am trying to expand dw-edma
> for additional use cases.
>
> First new use case is for integration of dw-edma with pci-epf-test so the latter
> can initiate dma transfers locally from endpoint memory to host memory over the
> PCIe bus in response to a user command issued from the host-side command
> prompt using the pcitest utility. When the locally-initiated dma
> transfer completes
> in this use case on the endpoint side, dw-edma issues an interrupt to the local
> CPU on the endpoint side by way of a legacy interrupt and pci-epf-test issues
> an interrupt toward the remote host CPU across the PCIe bus by way of legacy,
> MSI, or possibly MSI-X interrupt.
>
> Second new use case is for integration of dw-edma with pci_endpoint_test
> running on the host CPU so the latter can initiate dma transfers locally from
> host-side in response to a user command issued from the host-side command
> prompt using the pcitest utility. This use case is for host systems that have
> Synopsys DesignWare PCI eDMA hardware on the host side. When the
> locally-initiated dma transfer completes in this use case on the host-side,
> dw-edma issues a legacy interrupt to its local host CPU and pci-epf-test running
> on the endpoint side issues a legacy, MSI, or possibly MSI-X interrupt
> across the
> PCIe bus toward the host CPU.
>
> When both the host and endpoint sides have the Synopsys DesignWare PCI
> eDMA hardware, more use cases become possible in which eDMA controllers
> from both systems can be engaged to move data. Embedded DMA controllers
> from other PCIe IP vendors may also be supported with additional dmaengine
> drivers under the Linux PCI Endpoint Framework with pci-epf-test, pcitest, and
> pci_endpoint_test suite as well as new PCI endpoint function drivers for such
> applications that require dma, for example nvme or virtio_net endpoint function
> drivers.
>
> I submitted a recent patch [1] and [2] which Gustavo ACk'd to decouple dw-edma
> from struct pci_dev. This enabled me to exercise dw-edma on some riscv host
> and endpoint systems that I work with.
>
> I will submit another patch to decouple dw-edma from struct msi_msg such
> that it would only call get_cached_msi_msg() on the host-side in its
> original use case with remotely initiated dma transfers using the BAR
> access method.
>
> The crash that I reported in __get_cached_msi_msg() is probably worth
> fixing too. It seems to be low impact since get_cached_msi_msg()
> seems to be called infrequently by a few callers.

It isn't about the frequency of the calls, nor the overhead of this
function. It is about the fundamental difference between a wired
interrupt (in most case a level triggered one) and a MSI (edge by
definition on PCI). By making get_cached_msi_msg() "safe" to be called
for non-MSI IRQs, you hide a variety of design bugs which would
otherwise be obvious, like the one you are currently dealing with.

Your eDMA driver uses MSI by construction, and is likely to use the MSI
semantics (edge triggering, coalescing, memory barrier). On the other
hand, your use case is likely to have interrupts with very different
semantics (level triggered, no barrier effect). Papering over these
differences is not the way to go, I'm afraid.

I would recommend that you adapt the driver to have a separate
interrupt management for the non-MSI case, or at least not blindly use
MSI-specific APIs when not using them.

Thanks,

M.
--
Jazz is not dead. It just smells funny...