Re: [RFC] extending splice for copy offloading

From: Ric Wheeler
Date: Fri Sep 27 2013 - 10:01:44 EST

On 09/27/2013 12:47 AM, Miklos Szeredi wrote:
On Thu, Sep 26, 2013 at 11:23 PM, Ric Wheeler <rwheeler@xxxxxxxxxx> wrote:
On 09/26/2013 03:53 PM, Miklos Szeredi wrote:
On Thu, Sep 26, 2013 at 9:06 PM, Zach Brown <zab@xxxxxxxxxx> wrote:

But I'm not sure it's worth the effort; 99% of the use of this
interface will be copying whole files. And for that perhaps we need a
different API, one which has been discussed some time ago:
asynchronous copyfile() returns immediately with a pollable event
descriptor indicating copy progress, and some way to cancel the copy.
And that can internally rely on ->direct_splice(), with appropriate
algorithms for determine the optimal chunk size.
And perhaps we don't. Perhaps we can provide this much simpler
data-plane interface that works well enough for most everyone and can
avoid going down the async rat hole, yet again.
I think either buffering or async is needed to get good perforrmace
without too much complexity in the app (which is not good). Buffering
works quite well for regular I/O, so maybe its the way to go here as


Buffering misses the whole point of the copy offload - the idea is *not* to
read or write the actual data in the most interesting cases which offload
the operation to a smart target device or file system.
I meant buffering the COPY, not the data. Doing the COPY
synchronously will always incur a performance penalty, the amount
depending on the latency, which can be significant with networking.

We think of write(2) as a synchronous interface, because that's the
appearance we get from all that hard work the page cache and delayed
writeback code does to make an asynchronous operation look as if it
was synchronous. So from a userspace API perspective a sync interface
is nice, but inside we almost always have async interfaces to do the
actual work.


I think that you are an order of magnitude off here in thinking about the scale of the operations.

An enabled, synchronize copy offload to an array (or one that turns into a reflink locally) is effectively the cost of the call itself. Let's say no slower than one IO to a S-ATA disk (10ms?) as a pessimistic guess. Realistically, that call is much faster than that worst case number.

Copying any substantial amount of data - like the target workload of VM images or media files - would be hundreds of MB's per copy and that would take seconds or minutes.

We should really work on getting the basic mechanism working and robust without any complications, then we can look at real, measured performance and see if there is any justification for adding complexity.



To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at