Re: [PATCH/RFC] Simplified Readahead

From: Steven Pratt
Date: Fri Sep 24 2004 - 17:41:59 EST


Andrew Morton wrote:

Steven Pratt <slpratt@xxxxxxxxxxxxxx> wrote:


The advantage of the current page-at-a-time code is that the readahead code
behaves exactly the same, whether the application is doing 256 4k reads or
one 1M read. Plus it fits the old pagefault requirement.



Yes, but it accomplishes this by possible making the 1M slower. And I must admit that I don't know what the "old pagefault requirement" is. Is that something we still need to worry about?



The "old pagefault requirement": the code in there used to perform
readaround at pagefault time as well as readahead at read() time. Hence it
had to work well for single-page requests. That requirement isn't there
any more but some of the code to support it is still there, perhaps.



1. pages already in cache




Yes, we need to handle this. All pages in cache with lots of CPUs
hammering the same file is a common case.

Maybe not so significant on small x86, but on large power4 with a higher
lock-versus-memcpy cost ratio, that extra locking will hurt.



Ok, we have some data from larger machines. I will collect it all and summarize separately.



SDET would be interesting, as well as explicit testing of lots of processes
reading the same fully-cached file.


Don't have SDET but we have been working on the multiple processes reading same file case on a large(16way POWER4 with 128GB) machine. We had to apply some fdrcu patches to get past problems in fget_light which were causing 80%spin on the file_lock. We then end up with anout 45% spin lock on mapping->tree_lock. This is on vanilla rc2. I know you changed that to a rw lock in your tree and we nee to try that as well.. Data is not consistant enough to make any conclusions, but I don't see a dramatic change by turning off readahead. I need to do more testing on this to get better results.

In any case I agree that we should deal with this case.

cache we should just immediately turn off readahead. What is this trigger point? 4 I/Os in a row? 400?


Hard call.



I know, but we have to come up with something if we really want to avoid the double lookup.



As long as readahead gets fully disabled at some stage, we should be OK.


I am attaching a reworked patch which now shuts off readahead if 10M (arbitrary value for now) of I/O comes from page cache in a row. Any actual I/O will restart readahead.

We should probably compare i_size with mapping->nrpages at open() time,
too. No point in enabling readahead if it's all cached. But doing that
would make performance testing harder, so do it later.


Ok. Sounds good.



I do think we should skip the I/O for POSIX_FADV_WILLNEED against a
congested queue. I can't immediately think of a good reason for skipping
the I/O for normal readahead.




Can you expand on the POSIX_FADV_WILLNEED.



It's an application-specified readahead hint. It should ideally be
asynchronous so the application can get some I/O underway while it's
crunching on something else. If the queue is contested then the
application will accidentally block when launching the readahead, which
kinda defeats the purpose.


Well if the app really does this asynchronously, does it matter that we block?

Yes, the application will block when it does the subsequent read() anyway,
but applications expect to block in read(). Seems saner this way.

Just to be sure I have this correct, the readahead code will be invoked once on the POSIX_FADV_WILLNEED request, but this looks mostly like a regular read, and then again for the same pages on a real read?

Steve


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/