Re: Random file I/O regressions in 2.6

From: Ram Pai
Date: Mon May 03 2004 - 19:52:40 EST


On Mon, 2004-05-03 at 17:19, Nick Piggin wrote:
> Andrew Morton wrote:
> > Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:
> >
> >>>That's one of its usage patterns. It's also supposed to detect the
> >>>fixed-sized-reads-seeking-all-over-the-place situation. In which case it's
> >>>supposed to submit correctly-sized multi-page BIOs. But it's not working
> >>>right for this workload.
> >>>
> >>>A naive solution would be to add special-case code which always does the
> >>>fixed-size readahead after a seek. Basically that's
> >>>
> >>> if (ra->next_size == -1UL)
> >>> force_page_cache_readahead(...)
> >>>
> >>
> >>I think a better solution to this case would be to ensure the
> >>readahead window is always min(size of read, some large number);
> >>
> >
> >
> > That would cause the kernel to perform lots of pointless pagecache lookups
> > when the file is already 100% cached.
> >
>
>
> That's pretty sad. You need a "preread" or something which
> sends the pages back... or uses the actor itself. readahead
> would then have to be reworked to only run off the end of
> the read window, but that is what it should be doing anyway.

Sorry, If I am saying this again. I have checked the behaviour of the
readahead code using my user level simulator as well as running some
DSS benchmark and iozone benchmark. It generates a steady stream of
large i/o for large-random-reads and should not exhibit the bad behavior
that we are seeing. I feel this bad behavior is because of interleaved
access by multiple thread.

To illustrate with an example:

t1 request reads from page 100 to 104
simultaneously t2 requests reads on the same fd from 200 to 204

So do_page_cache_readahead() can be called in the following pattern.
100,200,101,201,102,202,103,203,104,204.
Because of this pattern the readahaed code assumes that the read pattern
is absolutely random and hence closes the readahead window.

I think I should generate a patch to validate this behavior, I will.
How about having some /proc counters that keep track of number of
window-closes because of cache-hits and because of cache-misses?

RP

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/