Re: [RFC] extending splice for copy offloading

From: Ric Wheeler
Date: Mon Sep 30 2013 - 11:29:30 EST

On 09/30/2013 10:24 AM, Miklos Szeredi wrote:
On Mon, Sep 30, 2013 at 4:52 PM, Ric Wheeler <rwheeler@xxxxxxxxxx> wrote:
On 09/30/2013 10:51 AM, Miklos Szeredi wrote:
On Mon, Sep 30, 2013 at 4:34 PM, J. Bruce Fields <bfields@xxxxxxxxxxxx>
My other worry is about interruptibility/restartability. Ideas?

What happens on splice(from, to, 4G) and it's a non-reflink copy?
Can the page cache copy be made restartable? Or should splice() be
allowed to return a short count? What happens on (non-reflink) remote
copies and huge request sizes?
If I were writing an application that required copies to be restartable,
I'd probably use the largest possible range in the reflink case but
break the copy into smaller chunks in the splice case.

The app really doesn't want to care about that. And it doesn't want
to care about restartability, etc.. It's something the *kernel* has
to care about. You just can't have uninterruptible syscalls that
sleep for a "long" time, otherwise first you'll just have annoyed
users pressing ^C in vain; then, if the sleep is even longer, warnings
about task sleeping too long.

One idea is letting splice() return a short count, and so the app can
safely issue SIZE_MAX requests and the kernel can decide if it can
copy the whole file in one go or if it wants to do it in smaller

You cannot rely on a short count. That implies that an offloaded copy starts
at byte 0 and the short count first bytes are all valid.

- app calls splice(from, 0, to, 0, SIZE_MAX)
1) VFS calls ->direct_splice(from, 0, to, 0, SIZE_MAX)
1.a) fs reflinks the whole file in a jiffy and returns the size of the file
1 b) fs does copy offload of, say, 64MB and returns 64M
2) VFS does page copy of, say, 1MB and returns 1MB
- app calls splice(from, X, to, X, SIZE_MAX) where X is the new offset

The point is: the app is always doing the same (incrementing offset
with the return value from splice) and the kernel can decide what is
the best size it can service within a single uninterruptible syscall.

Wouldn't that work?



Keep in mind that the offload operation in (1) might fail partially. The target file (the copy) is allocated, the question is what ranges have valid data.

I don't see that (2) is interesting or really needed to be done in the kernel. If nothing else, it tends to confuse the discussion....


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at