Re: [PATCH 2/4] direct-io: make O_DIRECT IO path be page based

From: Jens Axboe
Date: Mon Aug 24 2009 - 03:59:07 EST


On Thu, Aug 20 2009, Trond Myklebust wrote:
> On Thu, 2009-08-20 at 12:17 +0200, Jens Axboe wrote:
> > Currently we pass in the iovec array and let the O_DIRECT core
> > handle the get_user_pages() business. This work, but it means that
> > we can ever only use user pages for O_DIRECT.
> >
> > Switch the aops->direct_IO() and below code to use page arrays
> > instead, so that it doesn't make any assumptions about who the pages
> > belong to. This works directly for all users but NFS, which just
> > uses the same helper that the generic mapping read/write functions
> > also call.
> >
> > Signed-off-by: Jens Axboe <jens.axboe@xxxxxxxxxx>
> > ---
> > static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
> > - const struct iovec *iov,
> > - loff_t pos, int sync)
> > + struct dio_args *args,
> > + int sync)
> > {
> > struct nfs_open_context *ctx = dreq->ctx;
> > struct inode *inode = ctx->path.dentry->d_inode;
> > - unsigned long user_addr = (unsigned long)iov->iov_base;
> > - size_t count = iov->iov_len;
> > + unsigned long user_addr = args->user_addr;
> > + size_t count = args->length;
> > struct rpc_task *task;
> > struct rpc_message msg = {
> > .rpc_cred = ctx->cred,
> > @@ -726,24 +702,8 @@ static ssize_t nfs_direct_write_schedule_segment(struct nfs_direct_req *dreq,
> > if (unlikely(!data))
> > break;
> >
> > - down_read(&current->mm->mmap_sem);
> > - result = get_user_pages(current, current->mm, user_addr,
> > - data->npages, 0, 0, data->pagevec, NULL);
> > - up_read(&current->mm->mmap_sem);
> > - if (result < 0) {
> > - nfs_writedata_free(data);
> > - break;
> > - }
> > - if ((unsigned)result < data->npages) {
> > - bytes = result * PAGE_SIZE;
> > - if (bytes <= pgbase) {
> > - nfs_direct_release_pages(data->pagevec, result);
> > - nfs_writedata_free(data);
> > - break;
> > - }
> > - bytes -= pgbase;
> > - data->npages = result;
> > - }
> > + data->pagevec = args->pages;
> > + data->npages = args->nr_segs;
> >
> > get_dreq(dreq);
> >
>
> This looks a bit odd. What guarantees that args->pages contain <= wsize
> bytes? The server will not accept larger segments in a single RPC call.

Nothing, thanks for the info. The NFS bits are still very much untested,
I'll post an update soon.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/