Re: [PATCH v7 01/31] iov_iter: Add ITER_XARRAY

From: Matthew Wilcox
Date: Fri Apr 23 2021 - 10:07:05 EST


On Fri, Apr 23, 2021 at 02:28:01PM +0100, David Howells wrote:
> +#define iterate_xarray(i, n, __v, skip, STEP) { \
> + struct page *head = NULL; \
> + size_t wanted = n, seg, offset; \
> + loff_t start = i->xarray_start + skip; \
> + pgoff_t index = start >> PAGE_SHIFT; \
> + int j; \
> + \
> + XA_STATE(xas, i->xarray, index); \
> + \
> + rcu_read_lock(); \
> + xas_for_each(&xas, head, ULONG_MAX) { \
> + if (xas_retry(&xas, head)) \
> + continue; \
> + if (WARN_ON(xa_is_value(head))) \
> + break; \
> + if (WARN_ON(PageHuge(head))) \
> + break; \
> + for (j = (head->index < index) ? index - head->index : 0; \
> + j < thp_nr_pages(head); j++) { \

if head->index > index, something has gone disastrously wrong.

for (j = index - head->index; j < thp_nr_pages(head); j++) { \

would be enough.

However ... the tree you were originally testing this against has the
page cache fixed to use only one entry per THP. The tree you want to
apply this to inserts 2^n entries per THP. They're all the head page,
but they're distinct entries as far as xas_for_each() is concerned.
So I think the loop you want looks like this:

+ rcu_read_lock(); \
+ xas_for_each(&xas, head, ULONG_MAX) { \
+ if (xas_retry(&xas, head)) \
+ continue; \
+ if (WARN_ON(xa_is_value(head))) \
+ break; \
+ if (WARN_ON(PageHuge(head))) \
+ break; \
+ __v.bv_page = head + index - head->index; \
+ offset = offset_in_page(i->xarray_start + skip); \
+ seg = PAGE_SIZE - offset; \
+ __v.bv_offset = offset; \
+ __v.bv_len = min(n, seg); \
+ (void)(STEP); \
+ n -= __v.bv_len; \
+ skip += __v.bv_len; \
+ if (n == 0) \
+ break; \
+ } \
+ rcu_read_unlock(); \

Now, is this important? There are no filesystems which do I/O to THPs
today. So it's not possible to pick up the fact that it doesn't work,
and I hope to have the page cache fixed soon. And fixing this now
will create more work later as part of fixing the page cache. But I
wouldn't feel right not mentioning this problem ...

(also, iov_iter really needs to be fixed to handle bvecs which cross
page boundaries, but that's a fight for another day)