Re: [RFC] extending splice for copy offloading

From: Zach Brown
Date: Thu Sep 26 2013 - 14:56:03 EST

On Thu, Sep 26, 2013 at 10:58:05AM +0200, Miklos Szeredi wrote:
> On Wed, Sep 25, 2013 at 11:07 PM, Zach Brown <zab@xxxxxxxxxx> wrote:
> >> A client-side copy will be slower, but I guess it does have the
> >> advantage that the application can track progress to some degree, and
> >> abort it fairly quickly without leaving the file in a totally undefined
> >> state--and both might be useful if the copy's not a simple constant-time
> >> operation.
> >
> > I suppose, but can't the app achieve a nice middle ground by copying the
> > file in smaller syscalls? Avoid bulk data motion back to the client,
> > but still get notification every, I dunno, few hundred meg?
> Yes. And if "cp" could just be switched from a read+write syscall
> pair to a single splice syscall using the same buffer size. And then
> the user would only notice that things got faster in case of server
> side copy. No problems with long blocking times (at least not much
> worse than it was).

Hmm, yes, that would be a nice outcome.

> However "cp" doesn't do reflinking by default, it has a switch for
> that. If we just want "cp" and the like to use splice without fearing
> side effects then by default we should try to be as close to
> read+write behavior as possible. No?

I guess? I don't find requiring --reflink hugely compelling. But there
it is.

> That's what I'm really
> worrying about when you want to wire up splice to reflink by default.
> I do think there should be a flag for that. And if on the block level
> some magic happens, so be it. It's not the fs deverloper's worry any
> more ;)

Sure. So we'd have:

- no flag default that forbids knowingly copying with shared references
so that it will be used by default by people who feel strongly about
their assumptions about independent write durability.

- a flag that allows shared references for people who would otherwise
use the file system shared reference ioctls (ocfs2 reflink, btrfs
clone) but would like it to also do server-side read/write copies
over nfs without additional intervention.

- a flag that requires shared references for callers who don't want
giant copies to take forever if they aren't instant. (The qemu guys
asked for this at Plumbers.)

I think I can live with that.

- z
