Re: [PATCH v8 02/10] iov_iter: Add a function to extract a page list from an iterator

From: David Hildenbrand
Date: Tue Jan 24 2023 - 09:28:33 EST


On 23.01.23 18:29, David Howells wrote:
Add a function, iov_iter_extract_pages(), to extract a list of pages from
an iterator. The pages may be returned with a pin added or nothing,
depending on the type of iterator.

Add a second function, iov_iter_extract_mode(), to determine how the
cleanup should be done.

There are two cases:

(1) ITER_IOVEC or ITER_UBUF iterator.

Extracted pages will have pins (FOLL_PIN) obtained on them so that a
concurrent fork() will forcibly copy the page so that DMA is done
to/from the parent's buffer and is unavailable to/unaffected by the
child process.

iov_iter_extract_mode() will return FOLL_PIN for this case. The
caller should use something like folio_put_unpin() to dispose of the
page.

(2) Any other sort of iterator.

No refs or pins are obtained on the page, the assumption is made that
the caller will manage page retention.

iov_iter_extract_mode() will return 0. The pages don't need
additional disposal.

Signed-off-by: David Howells <dhowells@xxxxxxxxxx>
cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
cc: Christoph Hellwig <hch@xxxxxx>
cc: John Hubbard <jhubbard@xxxxxxxxxx>
cc: David Hildenbrand <david@xxxxxxxxxx>
cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
cc: linux-fsdevel@xxxxxxxxxxxxxxx
cc: linux-mm@xxxxxxxxx
---

Notes:
ver #8)
- It seems that all DIO is supposed to be done under FOLL_PIN now, and not
FOLL_GET, so switch to only using pin_user_pages() for user-backed
iters.
- Wrap an argument in brackets in the iov_iter_extract_mode() macro.
- Drop the extract_flags argument to iov_iter_extract_mode() for now
[hch].
ver #7)
- Switch to passing in iter-specific flags rather than FOLL_* flags.
- Drop the direction flags for now.
- Use ITER_ALLOW_P2PDMA to request FOLL_PCI_P2PDMA.
- Disallow use of ITER_ALLOW_P2PDMA with non-user-backed iter.
- Add support for extraction from KVEC-type iters.
- Use iov_iter_advance() rather than open-coding it.
- Make BVEC- and KVEC-type skip over initial empty vectors.
ver #6)
- Add back the function to indicate the cleanup mode.
- Drop the cleanup_mode return arg to iov_iter_extract_pages().
- Pass FOLL_SOURCE/DEST_BUF in gup_flags. Check this against the iter
data_source.
ver #4)
- Use ITER_SOURCE/DEST instead of WRITE/READ.
- Allow additional FOLL_* flags, such as FOLL_PCI_P2PDMA to be passed in.
ver #3)
- Switch to using EXPORT_SYMBOL_GPL to prevent indirect 3rd-party access
to get/pin_user_pages_fast()[1].

include/linux/uio.h | 22 +++
lib/iov_iter.c | 320 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 342 insertions(+)

diff --git a/include/linux/uio.h b/include/linux/uio.h
index 46d5080314c6..a8165335f8da 100644
--- a/include/linux/uio.h
+++ b/include/linux/uio.h
@@ -363,4 +363,26 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction,
/* Flags for iov_iter_get/extract_pages*() */
#define ITER_ALLOW_P2PDMA 0x01 /* Allow P2PDMA on the extracted pages */
+ssize_t iov_iter_extract_pages(struct iov_iter *i, struct page ***pages,
+ size_t maxsize, unsigned int maxpages,
+ unsigned int extract_flags, size_t *offset0);
+
+/**
+ * iov_iter_extract_mode - Indicate how pages from the iterator will be retained
+ * @iter: The iterator
+ *
+ * Examine the iterator and indicate by returning FOLL_PIN or 0 as to how, if
+ * at all, pages extracted from the iterator will be retained by the extraction
+ * function.
+ *
+ * FOLL_PIN indicates that the pages will have a pin placed in them that the
+ * caller must unpin. This is must be done for DMA/async DIO to force fork()
+ * to forcibly copy a page for the child (the parent must retain the original
+ * page).
+ *
+ * 0 indicates that no measures are taken and that it's up to the caller to
+ * retain the pages.
+ */
+#define iov_iter_extract_mode(iter) (user_backed_iter(iter) ? FOLL_PIN : 0)
+

Does it make sense to move that to the patch where it is needed? (do we need it at all anymore?)

--
Thanks,

David / dhildenb