Re: [PATCH v11 2/8] iov_iter: Add a function to extract a page list from an iterator

From: David Hildenbrand
Date: Fri Jan 27 2023 - 07:36:00 EST


On 27.01.23 13:30, Jan Kara wrote:
On Fri 27-01-23 02:02:31, Al Viro wrote:
On Fri, Jan 27, 2023 at 12:44:08AM +0100, David Hildenbrand wrote:
On 26.01.23 23:36, Al Viro wrote:
On Thu, Jan 26, 2023 at 09:59:36PM +0000, Al Viro wrote:
On Thu, Jan 26, 2023 at 02:16:20PM +0000, David Howells wrote:

+/**
+ * iov_iter_extract_will_pin - Indicate how pages from the iterator will be retained
+ * @iter: The iterator
+ *
+ * Examine the iterator and indicate by returning true or false as to how, if
+ * at all, pages extracted from the iterator will be retained by the extraction
+ * function.
+ *
+ * %true indicates that the pages will have a pin placed in them that the
+ * caller must unpin. This is must be done for DMA/async DIO to force fork()
+ * to forcibly copy a page for the child (the parent must retain the original
+ * page).
+ *
+ * %false indicates that no measures are taken and that it's up to the caller
+ * to retain the pages.
+ */
+static inline bool iov_iter_extract_will_pin(const struct iov_iter *iter)
+{
+ return user_backed_iter(iter);
+}
+

Wait a sec; why would we want a pin for pages we won't be modifying?
A reference - sure, but...

After having looked through the earlier iterations of the patchset -
sorry, but that won't fly for (at least) vmsplice(). There we can't
pin those suckers;

We'll need a way to pass FOLL_LONGTERM to pin_user_pages_fast() to handle
such long-term pinning as vmsplice() needs. But the release path (unpin)
will be the same.

Umm... Are you saying that if the source area contains DAX mmaps, vmsplice()
from it will fail?

Yes, that's the plan. Because as you wrote elsewhere, it is otherwise too easy
to lock up operations such as truncate(2) on DAX filesystems.

Right, it's then the same behavior as we already have for other FOLL_LONGTERM users, such as RDMA or io_uring.

... if we're afraid of breaking existing setups we could add some kind of fallback to copy to a buffer like ordinary pipe writes.

--
Thanks,

David / dhildenb