Re: [RFC] kernel facilities for cache prefetching

From: Wu Fengguang
Date: Wed May 03 2006 - 03:12:52 EST


On Tue, May 02, 2006 at 08:55:06AM -0700, Linus Torvalds wrote:
> So I would _seriously_ claim that the place to do all the statistics
> allocation is in anything that ends up having to call "->readpage()", and
> do it all on a virtual mapping level.
>
> Yes, it isn't perfect either (I'll mention some problems), but it's a
> _lot_ better. It means that when you gather the statistics, you can see
> the actual _files_ and offsets being touched. You can even get the
> filenames by following the address space -> inode -> i_dentry list.
>
> This is important for several reasons:
> (a) it makes it a hell of a lot more readable, and the user gets a
> lot more information that may make him see the higher-level issues
> involved.
> (b) it's in the form that we cache things, so if you read-ahead in
> that form, you'll actually get real information.
> (c) it's in a form where you can actually _do_ something about things
> like fragmentation etc ("Oh, I could move these files all to a
> separate area")

There have been two alternatives for me:
1) static/passive interface i.e. the /proc/filecache querier
- user-land tools request to dump the cache contents on demand
2) dynamic/active interface i.e. the readpage() logger
- user-land daemon accepts live page access/io activities

> Now, admittedly it has a few downsides:
>
> - right now "readpage()" is called in several places, and you'd have to
> create some kind of nice wrapper for the most common
> "mapping->a_ops->readpage()" thing and hook into there to avoid
> duplicating the effort.
>
> Alternatively, you could decide that you only want to do this at the
> filesystem level, which actually simplifies some things. If you
> instrument "mpage_readpage[2]()", you'll already get several of the
> ones you care about, and you could do the others individually.
>
> [ As a third alternative, you might decide that the only thing you
> actually care about is when you have to wait on a locked page, and
> instrument the page wait-queues instead. ]
>
> - it will miss any situation where a filesystem does a read some other
> way. Notably, in many loads, the _directory_ accesses are the important
> ones, and if you want statistics for those you'd often have to do that
> separately (not always - some of the filesystems just use the same
> page reading stuff).
>
> The downsides basically boil down to the fact that it's not as clearly
> just one single point. You can't just look at the request queue and see
> what physical requests go out.

Good insights.
The readpage() activities logging idea has been appealing for me.
We might even go further to log mark_page_accessed() calls for more
information.

This approach is more precise, and provides process/page
correlations and time info that the /proc/filecache interface cannot
provide. Though it involves more complexity and overhead(for me they
mean the possibility of being rejected:).

Wu
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/