RE: [RFC] extending splice for copy offloading

From: Myklebust, Trond
Date: Mon Sep 30 2013 - 11:33:55 EST


> -----Original Message-----
> From: Ric Wheeler [mailto:rwheeler@xxxxxxxxxx]
> Sent: Monday, September 30, 2013 10:29 AM
> To: Miklos Szeredi
> Cc: J. Bruce Fields; Myklebust, Trond; Zach Brown; Anna Schumaker; Kernel
> Mailing List; Linux-Fsdevel; linux-nfs@xxxxxxxxxxxxxxx; Schumaker, Bryan;
> Martin K. Petersen; Jens Axboe; Mark Fasheh; Joel Becker; Eric Wong
> Subject: Re: [RFC] extending splice for copy offloading
>
> On 09/30/2013 10:24 AM, Miklos Szeredi wrote:
> > On Mon, Sep 30, 2013 at 4:52 PM, Ric Wheeler <rwheeler@xxxxxxxxxx>
> wrote:
> >> On 09/30/2013 10:51 AM, Miklos Szeredi wrote:
> >>> On Mon, Sep 30, 2013 at 4:34 PM, J. Bruce Fields
> >>> <bfields@xxxxxxxxxxxx>
> >>> wrote:
> >>>>> My other worry is about interruptibility/restartability. Ideas?
> >>>>>
> >>>>> What happens on splice(from, to, 4G) and it's a non-reflink copy?
> >>>>> Can the page cache copy be made restartable? Or should splice() be
> >>>>> allowed to return a short count? What happens on (non-reflink)
> >>>>> remote copies and huge request sizes?
> >>>> If I were writing an application that required copies to be
> >>>> restartable, I'd probably use the largest possible range in the
> >>>> reflink case but break the copy into smaller chunks in the splice case.
> >>>>
> >>> The app really doesn't want to care about that. And it doesn't want
> >>> to care about restartability, etc.. It's something the *kernel* has
> >>> to care about. You just can't have uninterruptible syscalls that
> >>> sleep for a "long" time, otherwise first you'll just have annoyed
> >>> users pressing ^C in vain; then, if the sleep is even longer,
> >>> warnings about task sleeping too long.
> >>>
> >>> One idea is letting splice() return a short count, and so the app
> >>> can safely issue SIZE_MAX requests and the kernel can decide if it
> >>> can copy the whole file in one go or if it wants to do it in smaller
> >>> chunks.
> >>>
> >> You cannot rely on a short count. That implies that an offloaded copy
> >> starts at byte 0 and the short count first bytes are all valid.
> > Huh?
> >
> > - app calls splice(from, 0, to, 0, SIZE_MAX)
> > 1) VFS calls ->direct_splice(from, 0, to, 0, SIZE_MAX)
> > 1.a) fs reflinks the whole file in a jiffy and returns the size of the file
> > 1 b) fs does copy offload of, say, 64MB and returns 64M
> > 2) VFS does page copy of, say, 1MB and returns 1MB
> > - app calls splice(from, X, to, X, SIZE_MAX) where X is the new offset
> > ...
> >
> > The point is: the app is always doing the same (incrementing offset
> > with the return value from splice) and the kernel can decide what is
> > the best size it can service within a single uninterruptible syscall.
> >
> > Wouldn't that work?
> >
> > Thanks,
> > Miklos
>
> No.
>
> Keep in mind that the offload operation in (1) might fail partially. The target
> file (the copy) is allocated, the question is what ranges have valid data.
>
> I don't see that (2) is interesting or really needed to be done in the kernel.
> If nothing else, it tends to confuse the discussion....
>

Anna's figures, that were presented at Plumber's, show that (2) is still worth doing on the _server_ for the case of NFS.

Cheers
Trond
¢éì®&Þ~º&¶¬–+-±éÝ¥Šw®žË±Êâmébžìdz¹Þ)í…æèw*jg¬±¨¶‰šŽŠÝj/êäz¹ÞŠà2ŠÞ¨è­Ú&¢)ß«a¶Úþø®G«éh®æj:+v‰¨Šwè†Ù>Wš±êÞiÛaxPjØm¶Ÿÿà -»+ƒùdš_