Re: [PATCH v4 0/4] Implement dmabuf direct I/O via copy_file_range
From: Christoph Hellwig
Date: Tue Jun 10 2025 - 09:41:00 EST
On Tue, Jun 10, 2025 at 12:52:18PM +0200, Christian König wrote:
> >> dma_addr_t/len array now that the new DMA API supporting that has been
> >> merged. Is there any chance the dma-buf maintainers could start to kick this
> >> off? I'm of course happy to assist.
>
> Work on that is already underway for some time.
>
> Most GPU drivers already do sg_table -> DMA array conversion, I need
> to push on the remaining to clean up.
Do you have a pointer?
> >> Yes, that's really puzzling and should be addressed first.
> > With high CPU performance (e.g., 3GHz), GUP (get_user_pages) overhead
> > is relatively low (observed in 3GHz tests).
>
> Even on a low end CPU walking the page tables and grabbing references
> shouldn't be that much of an overhead.
Yes.
>
> There must be some reason why you see so much CPU overhead. E.g.
> compound pages are broken up or similar which should not happen in
> the first place.
pin_user_pages outputs an array of PAGE_SIZE (modulo offset and shorter
last length) array strut pages unfortunately. The block direct I/O
code has grown code to reassemble folios from them fairly recently
which did speed up some workloads.
Is this test using the block device or iomap direct I/O code? What
kernel version is it run on?