Re: Enabling peer to peer device transactions for PCIe devices

From: Petrosyan, Ludwig
Date: Sun Oct 22 2017 - 02:14:37 EST


Hello Logan

Thank You very much for respond.
Could be I have done is stupid...
But at first sight it has to be simple:
The PCIe Write transactions are address routed, so if in the packet header the other endpoint address is written the TLP has to be routed (by PCIe Switch to the endpoint), the DMA reading from the end point is really write transactions from the endpoint, usually (Xilinx core) to start DMA one has to write to the DMA control register of the endpoint the destination address. So I have change the device driver to set in this register the physical address of the other endpoint (get_resource start called to other endpoint, and it is the same address which I could see in lspci -vvvv -s bus-address of the switch port, memories behind bridge), so now the endpoint has to start send writes TLP with the other endpoint address in the TLP header.
But this is not working (I want to understand why ...), but I could see the first address of the destination endpoint is changed (with the wrong value 0xFF),
now I want to try prepare in the driver of one endpoint the DMA buffer , but using physical address of the other endpoint,
Could be it will never work, but I want to understand why, there is my error ...

with best regards

Ludwig

----- Original Message -----
From: "Logan Gunthorpe" <logang@xxxxxxxxxxxx>
To: "Ludwig Petrosyan" <ludwig.petrosyan@xxxxxxx>, "Deucher, Alexander" <Alexander.Deucher@xxxxxxx>, "linux-kernel@xxxxxxxxxxxxxxx" <linux-kernel@xxxxxxxxxxxxxxx>, "linux-rdma@xxxxxxxxxxxxxxx" <linux-rdma@xxxxxxxxxxxxxxx>, "linux-nvdimm@xxxxxxxxxxxx" <linux-nvdimm@xxxxxxxxxxxx>, "Linux-media@xxxxxxxxxxxxxxx" <Linux-media@xxxxxxxxxxxxxxx>, "dri-devel@xxxxxxxxxxxxxxxxxxxxx" <dri-devel@xxxxxxxxxxxxxxxxxxxxx>, "linux-pci@xxxxxxxxxxxxxxx" <linux-pci@xxxxxxxxxxxxxxx>
Cc: "Bridgman, John" <John.Bridgman@xxxxxxx>, "Kuehling, Felix" <Felix.Kuehling@xxxxxxx>, "Sagalovitch, Serguei" <Serguei.Sagalovitch@xxxxxxx>, "Blinzer, Paul" <Paul.Blinzer@xxxxxxx>, "Koenig, Christian" <Christian.Koenig@xxxxxxx>, "Suthikulpanit, Suravee" <Suravee.Suthikulpanit@xxxxxxx>, "Sander, Ben" <ben.sander@xxxxxxx>
Sent: Friday, 20 October, 2017 17:48:58
Subject: Re: Enabling peer to peer device transactions for PCIe devices

Hi Ludwig,

P2P transactions are still *very* experimental at the moment and take a
lot of expertise to get working in a general setup. It will definitely
require changes to the kernel, including the drivers of all the devices
you are trying to make talk to eachother. If you're up for it you can
take a look at:

https://github.com/sbates130272/linux-p2pmem/

Which has our current rough work making NVMe fabrics use p2p transactions.

Logan

On 10/20/2017 6:36 AM, Ludwig Petrosyan wrote:
> Dear Linux kernel group
>
> my name is Ludwig Petrosyan I am working in DESY (Germany)
>
> we are responsible for the control system of all accelerators in DESY.
>
> For a 7-8 years we have switched to MTCA.4 systems and using PCIe as a
> central Bus.
>
> I am mostly responsible for the Linux drivers of the AMC Cards (PCIe
> endpoints).
>
> The idea is start to use peer to peer transaction for PCIe endpoint (DMA
> and/or usual Read/Write)
>
> Could You please advise me where to start, is there some Documentation
> how to do it.
>
>
> with best regards
>
>
> Ludwig
>
>
> On 11/21/2016 09:36 PM, Deucher, Alexander wrote:
>> This is certainly not the first time this has been brought up, but I'd
>> like to try and get some consensus on the best way to move this
>> forward. Allowing devices to talk directly improves performance and
>> reduces latency by avoiding the use of staging buffers in system
>> memory. Also in cases where both devices are behind a switch, it
>> avoids the CPU entirely. Most current APIs (DirectGMA, PeerDirect,
>> CUDA, HSA) that deal with this are pointer based. Ideally we'd be
>> able to take a CPU virtual address and be able to get to a physical
>> address taking into account IOMMUs, etc. Having struct pages for the
>> memory would allow it to work more generally and wouldn't require as
>> much explicit support in drivers that wanted to use it.
>> Some use cases:
>> 1. Storage devices streaming directly to GPU device memory
>> 2. GPU device memory to GPU device memory streaming
>> 3. DVB/V4L/SDI devices streaming directly to GPU device memory
>> 4. DVB/V4L/SDI devices streaming directly to storage devices
>> Here is a relatively simple example of how this could work for
>> testing. This is obviously not a complete solution.
>> - Device memory will be registered with Linux memory sub-system by
>> created corresponding struct page structures for device memory
>> - get_user_pages_fast() will return corresponding struct pages when
>> CPU address points to the device memory
>> - put_page() will deal with struct pages for device memory
>> Previously proposed solutions and related proposals:
>> 1.P2P DMA
>> DMA-API/PCI map_peer_resource support for peer-to-peer
>> (http://www.spinics.net/lists/linux-pci/msg44560.html)
>> Pros: Low impact, already largely reviewed.
>> Cons: requires explicit support in all drivers that want to support
>> it, doesn't handle S/G in device memory.
>> 2. ZONE_DEVICE IO
>> Direct I/O and DMA for persistent memory
>> (https://lwn.net/Articles/672457/)
>> Add support for ZONE_DEVICE IO memory with struct pages.
>> (https://patchwork.kernel.org/patch/8583221/)
>> Pro: Doesn't waste system memory for ZONE metadata
>> Cons: CPU access to ZONE metadata slow, may be lost, corrupted on
>> device reset.
>> 3. DMA-BUF
>> RDMA subsystem DMA-BUF support
>> (http://www.spinics.net/lists/linux-rdma/msg38748.html)
>> Pros: uses existing dma-buf interface
>> Cons: dma-buf is handle based, requires explicit dma-buf support in
>> drivers.
>>
>> 4. iopmem
>> iopmem : A block device for PCIe memory
>> (https://lwn.net/Articles/703895/)
>> 5. HMM
>> Heterogeneous Memory Management
>> (http://lkml.iu.edu/hypermail/linux/kernel/1611.2/02473.html)
>>
>> 6. Some new mmap-like interface that takes a userptr and a length and
>> returns a dma-buf and offset?
>> Alex
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> _______________________________________________
> Linux-nvdimm mailing list
> Linux-nvdimm@xxxxxxxxxxxx
> https://lists.01.org/mailman/listinfo/linux-nvdimm