Re: [PATCH 002/437] fs: add generic read/write iterator helpers
From: Al Viro
Date:  Mon Apr 15 2024 - 19:43:04 EST
On Mon, Apr 15, 2024 at 03:16:12PM -0600, Jens Axboe wrote:
> The old read/write path only handled user pointers, with the exception
> being bvecs mapped on the io_uring side (which the io_uring version
> dealt with) which also originally came from user pointers. So just user
> memory. Why would we need to expand that now? Who would be doing
> ->read_iter() or ->write_iter() with something that isn't either UBUF or
> IOVEC? Because that would break horrible as it is in the current kernel.
No, it will not.  And yes, it does happen.  Look, for example, at
fs/coredump.c:dump_emit_page().  ->write_iter() (regular file or pipe one)
is called.  On ITER_BVEC.
It happens, and not only there.  Look at how /dev/loop works, for crying
out loud!  You get a struct request; the backing pages might very well have
_never_ been mapped to any userland address (e.g. when it's something like
a directory block).  And you hit this:
static int lo_write_simple(struct loop_device *lo, struct request *rq,
                loff_t pos)
{
        struct bio_vec bvec;
        struct req_iterator iter;
        int ret = 0;
        rq_for_each_segment(bvec, rq, iter) {
                ret = lo_write_bvec(lo->lo_backing_file, &bvec, &pos);
                if (ret < 0)
                        break;
                cond_resched();
        }
        return ret;
}
with lo_write_bvec() being
static int lo_write_bvec(struct file *file, struct bio_vec *bvec, loff_t *ppos)
{
        struct iov_iter i;
        ssize_t bw;
        iov_iter_bvec(&i, ITER_SOURCE, bvec, 1, bvec->bv_len);
        bw = vfs_iter_write(file, &i, ppos, 0);
        if (likely(bw ==  bvec->bv_len))
                return 0;   
        printk_ratelimited(KERN_ERR
                "loop: Write error at byte offset %llu, length %i.\n",
                (unsigned long long)*ppos, bvec->bv_len);
        if (bw >= 0)
                bw = -EIO;
        return bw;
}
Neither ->read_iter() nor ->write_iter() can make an assumption that it
will be used with userland buffer.  And yes, it works...