Re: [PATCH v6 09/19] mm: Add page_cache_readahead_limit

From: Matthew Wilcox
Date: Tue Feb 18 2020 - 14:54:07 EST


On Tue, Feb 18, 2020 at 05:31:10PM +1100, Dave Chinner wrote:
> On Mon, Feb 17, 2020 at 10:45:56AM -0800, Matthew Wilcox wrote:
> > From: "Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx>
> >
> > ext4 and f2fs have duplicated the guts of the readahead code so
> > they can read past i_size. Instead, separate out the guts of the
> > readahead code so they can call it directly.
>
> Gross and nasty (hosting non-stale data beyond EOF in the page
> cache, that is).

I thought you meant sneaking changes into the VFS (that were rejected) by
copying VFS code and modifying it ...

> > +/**
> > + * page_cache_readahead_limit - Start readahead beyond a file's i_size.
> > + * @mapping: File address space.
> > + * @file: This instance of the open file; used for authentication.
> > + * @offset: First page index to read.
> > + * @end_index: The maximum page index to read.
> > + * @nr_to_read: The number of pages to read.
> > + * @lookahead_size: Where to start the next readahead.
> > + *
> > + * This function is for filesystems to call when they want to start
> > + * readahead potentially beyond a file's stated i_size. If you want
> > + * to start readahead on a normal file, you probably want to call
> > + * page_cache_async_readahead() or page_cache_sync_readahead() instead.
> > + *
> > + * Context: File is referenced by caller. Mutexes may be held by caller.
> > + * May sleep, but will not reenter filesystem to reclaim memory.
> > */
> > -void __do_page_cache_readahead(struct address_space *mapping,
> > - struct file *filp, pgoff_t offset, unsigned long nr_to_read,
> > - unsigned long lookahead_size)
> > +void page_cache_readahead_limit(struct address_space *mapping,
>
> ... I don't think the function name conveys it's purpose. It's
> really a ranged readahead that ignores where i_size lies. i.e
>
> page_cache_readahead_range(mapping, start, end, nr_to_read)
>
> seems like a better API to me, and then you can drop the "start
> readahead beyond i_size" comments and replace it with "Range is not
> limited by the inode's i_size and hence can be used to read data
> stored beyond EOF into the page cache."

I'm concerned that calling it 'range' implies "I want to read between
start and end" rather than "I want to read nr_to_read at start, oh but
don't go past end".

Maybe the right way to do this is have the three callers cap nr_to_read.
Well, the one caller ... after all, f2fs and ext4 have no desire to
cap the length. Then we can call it page_cache_readahead_exceed() or
page_cache_readahead_dangerous() or something else like that to make it
clear that you shouldn't be calling it.

> Also: "This is almost certainly not the function you want to call.
> Use page_cache_async_readahead or page_cache_sync_readahead()
> instead."

+1 to that ;-)

Here's what I currently have: