Re: [PATCH] splice support #2

From: Andrew Morton
Date: Thu Mar 30 2006 - 05:14:07 EST


Jens Axboe <axboe@xxxxxxx> wrote:
>
> Hi,
>
> This patch should resolve all issues mentioned so far. I'd still like to
> implement the page moving, but that should just be a separate patch.
> After this round of patching, I thought I'd toss a fresh version out
> there for all to look at.
>
> Signed-off-by: Jens Axboe <axboe@xxxxxxx>
>
> ...
>
> --- a/arch/ia64/kernel/entry.S
> +++ b/arch/ia64/kernel/entry.S
> @@ -1605,5 +1605,6 @@ sys_call_table:
> data8 sys_ni_syscall // reserved for pselect
> data8 sys_ni_syscall // 1295 reserved for ppoll
> data8 sys_unshare
> + date8 sys_splice

typo

> +static void *page_cache_pipe_buf_map(struct file *file,
> + struct pipe_inode_info *info,
> + struct pipe_buffer *buf)
> +{
> + struct page *page = buf->page;
> +
> + lock_page(page);
> +
> + if (!PageUptodate(page)) {

|| page->mapping == NULL

> + struct page *pages[PIPE_BUFFERS];
> + struct page *page;
> + pgoff_t index, pidx;
> + int i;
> +
> + index = in->f_pos >> PAGE_CACHE_SHIFT;
> + offset = in->f_pos & ~PAGE_CACHE_MASK;
> + nr_pages = (len + offset + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
> +
> + if (nr_pages > PIPE_BUFFERS)
> + nr_pages = PIPE_BUFFERS;
> +
> + /*
> + * initiate read-ahead on this page range
> + */
> + do_page_cache_readahead(mapping, in, index, nr_pages);
> +
> + /*
> + * Get as many pages from the page cache as possible..
> + * Start IO on the page cache entries we create (we
> + * can assume that any pre-existing ones we find have
> + * already had IO started on them).
> + */
> + i = find_get_pages(mapping, index, nr_pages, pages);
> +
> + /*
> + * common case - we found all pages and they are contiguous,
> + * kick them off
> + */
> + if (i && (pages[i - 1]->index == index + i - 1))
> + goto splice_them;
> +
> + memset(&pages[i], 0, (nr_pages - i) * sizeof(struct page *));
> +
> + /*
> + * find_get_pages() may not return consecutive pages, so loop
> + * over the array moving pages and filling the rest, if need be.
> + */
> + for (i = 0, pidx = index; i < nr_pages; pidx++, i++) {
> + if (!pages[i]) {
> + int error;
> +fill_page:
> + /*
> + * no page there, look one up / create it
> + */
> + page = find_or_create_page(mapping, pidx,
> + mapping_gfp_mask(mapping));
> + if (!page)
> + break;
> +
> + if (PageUptodate(page))
> + unlock_page(page);
> + else {
> + error = mapping->a_ops->readpage(in, page);
> +
> + if (unlikely(error)) {
> + page_cache_release(page);
> + break;
> + }
> + }
> + pages[i] = page;
> + } else if (pages[i]->index != pidx) {
> + page = pages[i];
> + /*
> + * page isn't in the right spot, move it and jump
> + * back to filling this one. we know that ->index
> + * is larger than pidx
> + */
> + pages[i + page->index - pidx] = page;
> + pages[i] = NULL;
> + goto fill_page;
> + }
> + }
> +
> + if (!i)
> + return 0;
> +
> + /*
> + * Now we splice them into the pipe..
> + */
> +splice_them:
> + return move_to_pipe(pipe, pages, i, offset, len);
> +}

OK, and this returns the number of bytes transferred all the way up to
userspace.

And, like read(), if we hit an IO error or ENOMEM partway through we'll
just return a short read which is indistinguishable from EOF.

But if we do get an IO error, we'll detect it in page_cache_pipe_buf_map().
Does the code still follow these read() return value semantics in that
case?

> ...

argh, am all reviewed out, and infiniband isn't compiling. Will look later.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/