RE: nfs client readdir caching issue?

From: Trond Myklebust
Date: Mon Jul 07 2008 - 13:20:26 EST


On Thu, 2008-07-03 at 09:47 +0100, Andy Chittenden wrote:
> > If so, then invalidate_inode_pages2_range() would have to be broken:
> we
> > always clear the readdir cache immediately after reading in the page
> > with index 0 (i.e. the first readdir page).
>
> I'm confused by the call to invalidate_inode_pages2_range:
>
> if (page->index == 0)
> invalidate_inode_pages2_range(inode->i_mapping,
> PAGE_CACHE_SIZE, -1);
>
> That's passing in a pgoff_t of 4096 as the start page offset from which
> to invalidate. And an enormous number for the end page to invalidate to.
> So it looks like the nfs client is trying to invalidate from a *byte*
> offset of 4096 (but if that's true, the first page could contain less
> than 4096 bytes depending on the size of the readdir response it
> received but I'll leave that to one side for the moment).
>
> What's confusing me is that when I look at the implementation of
> invalidate_inode_pages2_range, I see this call to pagevec_lookup:
>
> pagevec_lookup(&pvec, mapping, next,
> min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1))
> {
>
> looking at the pagevec_lookup comments, it claims the fourth parameter
> is the number of pages:
>
> * @nr_pages: The maximum number of pages
>
> So how can (end - next) be a number of pages? (next will be 4096 in the
> call from the nfs client). IE it looks like
> invalidate_inode_pages2_range is expecting a page range (as the name
> suggests). IE I'm wondering whether the call to
> invalidate_inode_pages2_range should be:
>
> if (page->index == 0)
> invalidate_inode_pages2_range(inode->i_mapping, 1, -1);
>
> FWIW We've also seen this on larger directories so I'm wondering what
> would happen if a readdir part way down the cookie chain returned more
> data (or less) than it did last time. IE if the above is correct, then
> replace the two lines with:
>
> invalidate_inode_pages2_range(inode->i_mapping, page->index + 1,
> -1);
>
> IE purge the rest of the pages for the inode.

Hi Andy,

Apologies for the late response. I was travelling on Thursday, and got
caught up with the long weekend due to the July 4th holiday in the US.


Yes, I agree with your comment above: the
invalidate_inode_pages2_range() call is specifying a byte range instead
of a page index range, and is therefore wrong.

Another thought is that individual pages might perhaps get evicted by VM
pressure, in which case we might perhaps want to re-read not only the
evicted page, but all subsequent pages too (in case the server returns
more/less data per page so that the alignment of the next entry
changes).

Cheers
Trond
-------------------------------------------------------------------------
From: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
NFS: Fix readdir cache invalidation

invalidate_inode_pages2_range() takes page offset arguments, not byte
ranges.

Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
---

fs/nfs/dir.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 58d43da..982a206 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -204,7 +204,7 @@ int nfs_readdir_filler(nfs_readdir_descriptor_t *desc, struct page *page)
* Note: assumes we have exclusive access to this mapping either
* through inode->i_mutex or some other mechanism.
*/
- if (page->index == 0 && invalidate_inode_pages2_range(inode->i_mapping, PAGE_CACHE_SIZE, -1) < 0) {
+ if (invalidate_inode_pages2_range(inode->i_mapping, page->index + 1, -1) < 0) {
/* Should never happen */
nfs_zap_mapping(inode, inode->i_mapping);
}


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/