Re: IB on s390 broken with commit 99db94940 "IB/core: Remove ib_device.dma_device"

From: Bart Van Assche
Date: Tue Feb 28 2017 - 13:55:37 EST


On Tue, 2017-02-28 at 09:53 +0100, Sebastian Ott wrote:
> On Mon, 27 Feb 2017, Bart Van Assche wrote:
>
> > On Mon, 2017-02-27 at 21:17 +0100, Sebastian Ott wrote:
> > > commit 99db94940 "IB/core: Remove ib_device.dma_device"
> > > breaks infiniband on s390 (and I think also other archs that do something
> > > like to_pci_dev(dev) in one of their dma_ops callbacks).
> > >
> > > With this commit you use the dma_ops of the device that called
> > > ib_register_device but you call e.g. dma_map with ib_device->dev
> > > as an argument.
> > >
> > > S390's (pci specific) dma_map uses to_pci_dev(dev) to look into the
> > > pci device (and its arch specific data) and oopses.
> > >
> > > Calling dma_map with ib_device->dev.parent would work but then it
> > > wouldn't make sense to copy dma_ops and mask from ib_device->dev.parent
> > > to ib_device->dev..
> >
> > How about something like the untested patch below?
>
> It works but it doesn't feel right (why should all pci devices have this
> duplicated data).
>
> Frankly I don't get the usecase of infiniband (sometimes) using
> device->dev.dma_ops instead of parent->dma_ops. Also that these values are
> selectively copied from the parent looks weird (opposed to all or nothing).
>
> What about reintroducing dma_device (as an infiniband internal) and set it
> to &ib_device->dev if you have to and to parent in all other cases?

Hello Sebastian,

There are three kinds of RDMA drivers:
- RDMA drivers that always use DMA for transferring data between memory and
HCA (e.g. mlx4, mlx5, ...). These drivers make the ULP call the PCI DMA
mapping functions directly.
- RDMA drivers that never use DMA directly but use another driver for
transferring data (e.g. rdma_rxe). This driver makes the ULP store virtual
addresses in .dma_address.
- RDMA drivers that decide whether to use PIO or DMA depending on e.g. the
QP type and the amount of data to be transferred (qib, hfi1). These drivers
also make the ULP store virtual addresses in .dma_address and decide
internally whether or not to invoke the PCI DMA mapping functions.

This is why a custom DMA mapping API was introduced in the RDMA subsystem.
Until recently the Linux RDMA subsystem not only had its own DMA mapping
operations but also its own template for DMA mapping operations (struct
ib_dma_mapping_ops). This is not only confusing but also led to a multitude
of incomplete and RDMA driver DMA mapping operations of which additionally
the behavior is slightly different of other DMA mapping operations. That's
why we want to evolve towards a single DMA mapping API. Reintroducing the
dma_device pointer in struct ib_device would make it impossible to use the
standard DMA mapping API for RDMA devices.

Bart.