Steven Pratt <slpratt@xxxxxxxxxxxxxx> wrote:Don't have SDET but we have been working on the multiple processes reading same file case on a large(16way POWER4 with 128GB) machine. We had to apply some fdrcu patches to get past problems in fget_light which were causing 80%spin on the file_lock. We then end up with anout 45% spin lock on mapping->tree_lock. This is on vanilla rc2. I know you changed that to a rw lock in your tree and we nee to try that as well.. Data is not consistant enough to make any conclusions, but I don't see a dramatic change by turning off readahead. I need to do more testing on this to get better results.
The advantage of the current page-at-a-time code is that the readahead codeYes, but it accomplishes this by possible making the 1M slower. And I must admit that I don't know what the "old pagefault requirement" is. Is that something we still need to worry about?
behaves exactly the same, whether the application is doing 256 4k reads or
one 1M read. Plus it fits the old pagefault requirement.
The "old pagefault requirement": the code in there used to perform
readaround at pagefault time as well as readahead at read() time. Hence it
had to work well for single-page requests. That requirement isn't there
any more but some of the code to support it is still there, perhaps.
Ok, we have some data from larger machines. I will collect it all and summarize separately.1. pages already in cacheYes, we need to handle this. All pages in cache with lots of CPUs
hammering the same file is a common case.
Maybe not so significant on small x86, but on large power4 with a higher
lock-versus-memcpy cost ratio, that extra locking will hurt.
SDET would be interesting, as well as explicit testing of lots of processes
reading the same fully-cached file.
I am attaching a reworked patch which now shuts off readahead if 10M (arbitrary value for now) of I/O comes from page cache in a row. Any actual I/O will restart readahead.I know, but we have to come up with something if we really want to avoid the double lookup.cache we should just immediately turn off readahead. What is this trigger point? 4 I/Os in a row? 400?Hard call.
As long as readahead gets fully disabled at some stage, we should be OK.
We should probably compare i_size with mapping->nrpages at open() time,Ok. Sounds good.
too. No point in enabling readahead if it's all cached. But doing that
would make performance testing harder, so do it later.
Well if the app really does this asynchronously, does it matter that we block?
I do think we should skip the I/O for POSIX_FADV_WILLNEED against aCan you expand on the POSIX_FADV_WILLNEED.
congested queue. I can't immediately think of a good reason for skipping
the I/O for normal readahead.
It's an application-specified readahead hint. It should ideally be
asynchronous so the application can get some I/O underway while it's
crunching on something else. If the queue is contested then the
application will accidentally block when launching the readahead, which
kinda defeats the purpose.
Yes, the application will block when it does the subsequent read() anyway,Just to be sure I have this correct, the readahead code will be invoked once on the POSIX_FADV_WILLNEED request, but this looks mostly like a regular read, and then again for the same pages on a real read?
but applications expect to block in read(). Seems saner this way.