Re: [LSF/MM TOPIC] Direct block mapping through fs for device

From: Adam Manzanares
Date: Fri Apr 26 2019 - 16:29:15 EST


On Thu, 2019-04-25 at 21:38 -0400, Jerome Glisse wrote:
> I see that they are still empty spot in LSF/MM schedule so i would
> like to
> have a discussion on allowing direct block mapping of file for
> devices (nic,
> gpu, fpga, ...). This is mm, fs and block discussion, thought the mm
> side
> is pretty light ie only adding 2 callback to vm_operations_struct:
>
> int (*device_map)(struct vm_area_struct *vma,
> struct device *importer,
> struct dma_buf **bufp,
> unsigned long start,
> unsigned long end,
> unsigned flags,
> dma_addr_t *pa);
>
> // Some flags i can think of:
> DEVICE_MAP_FLAG_PIN // ie return a dma_buf object
> DEVICE_MAP_FLAG_WRITE // importer want to be able to write
> DEVICE_MAP_FLAG_SUPPORT_ATOMIC_OP // importer want to do atomic
> operation
> // on the mapping
>
> void (*device_unmap)(struct vm_area_struct *vma,
> struct device *importer,
> unsigned long start,
> unsigned long end,
> dma_addr_t *pa);
>
> Each filesystem could add this callback and decide wether or not to
> allow
> the importer to directly map block. Filesystem can use what ever
> logic they
> want to make that decision. For instance if they are page in the page
> cache
> for the range then it can say no and the device would fallback to
> main
> memory. Filesystem can also update its internal data structure to
> keep
> track of direct block mapping.
>
> If filesystem decide to allow the direct block mapping then it
> forward the
> request to the block device which itself can decide to forbid the
> direct
> mapping again for any reasons. For instance running out of BAR space
> or
> peer to peer between block device and importer device is not
> supported or
> block device does not want to allow writeable peer mapping ...
>
>
> So event flow is:
> 1 program mmap a file (end never intend to access it with CPU)
> 2 program try to access the mmap from a device A
> 3 device A driver see device_map callback on the vma and call it
> 4a on success device A driver program the device to mapped dma
> address
> 4b on failure device A driver fallback to faulting so that it can
> use
> page from page cache
>
> This API assume that the importer does support mmu notifier and thus
> that
> the fs can invalidate device mapping at _any_ time by sending mmu
> notifier
> to all mapping of the file (for a given range in the file or for the
> whole
> file). Obviously you want to minimize disruption and thus only
> invalidate
> when necessary.
>
> The dma_buf parameter can be use to add pinning support for
> filesystem who
> wish to support that case too. Here the mapping lifetime get
> disconnected
> from the vma and is transfer to the dma_buf allocated by filesystem.
> Again
> filesystem can decide to say no as pinning blocks has drastic
> consequence
> for filesystem and block device.
>
>
> This has some similarities to the hmmap and caching topic (which is
> mapping
> block directly to CPU AFAIU) but device mapping can cut some corner
> for
> instance some device can forgo atomic operation on such mapping and
> thus
> can work over PCIE while CPU can not do atomic to PCIE BAR.
>
> Also this API here can be use to allow peer to peer access between
> devices
> when the vma is a mmap of a device file and thus vm_operations_struct
> come
> from some exporter device driver. So same 2 vm_operations_struct call
> back
> can be use in more cases than what i just described here.
>
>
> So i would like to gather people feedback on general approach and few
> things
> like:
> - Do block device need to be able to invalidate such mapping too
> ?
>
> It is easy for fs the to invalidate as it can walk file
> mappings
> but block device do not know about file.
>
> - Do we want to provide some generic implementation to share
> accross
> fs ?
>
> - Maybe some share helpers for block devices that could track
> file
> corresponding to peer mapping ?

I'm interested in being a part of this discussion.

>
>
> Cheers,
> JÃrÃme