[RFC] iov_iter_get_pages() semantics

From: Al Viro
Date: Tue Mar 31 2015 - 22:33:19 EST


On Mon, Dec 08, 2014 at 10:57:31AM -0800, Linus Torvalds wrote:
> actually, no we cannot. Thinking some more about it, that
> "get_page(page)" is wrong in _all_ cases. It actually works better for
> vmalloc pages than for normal 1:1 pages, since it's actually seriously
> and *horrendously* wrong for the case of random kernel addresses which
> may not even be refcounted to begin with.
>
> So the whole "get_page()" thing is broken. Iterating over pages in a
> KVEC is simply wrong, wrong, wrong. It needs to fail.
>
> Iterating over a KVEC to *copy* data is ok. But no page lookup stuff
> or page reference things.

Hmm... FWIW, for ITER_KVEC the underlying data would bloody better not
go away anyway - vmalloc space or not. Protecting the object from being
freed under us is caller's responsibility and caller can guarantee that.
Would a variant that does kmap_to_page()/vmalloc_to_page() _without_
get_page() for ITER_KVEC work sanely?

Of course, that would have to be used with matching primitive for releasing
those suckers - page_cache_release() for ITER_IOVEC (and ITER_BVEC, while
we are at it - those are backed with normal pages) and nothing for ITER_KVEC
ones.

It would make life much more pleasant for fuse and zerocopy side of 9p - the
latter does pretty much that kind of thing anyway...

Comments?
Al, digging himself from under a huge pile of mail...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/