Re: clustering page-ins

Rik van Riel (riel@nl.linux.org)
Tue, 20 Jul 1999 22:17:48 +0200 (CEST)


On Tue, 20 Jul 1999, Chuck Lever wrote:
> On Tue, 20 Jul 1999, Rik van Riel wrote:
> > By defining the top and bottom 1/8th of each zone a 'border'
> > page and loading the next zone when a page in the border zone
> > is loaded (AND the requested page is already in memory, meaning
> > we have already done readahead on the current zone and it proved
> > to be succesful), THEN we read in the zone nextdoor.
>
> how do you prevent this algorithm from reading behind when a
> sequential forward read goes through the bottom 1/8th of a zone
> that is already in cache?

Damn. I really should refrain from posting to linux-kernel
just because I can't sleep :)

> as jamie says, it's easy to recognize strict monotonically
> increasing sequential access to pages and aggressively read ahead
> in that case.

OK. Then I guess we want to implement this for mmap() and
swap I/O.

> is reading backwards prevalent enough that special logic should be
> added to watch for it?

Now that you've pointed out the extra complexity involved, I
don't think it is worth it. Read-ahead and read-around should
be enough for most cases...

> > Clustering does indeed work in this way. I have observed
> > (during a busy time of my emergeny packetstorm mirror) a
> > VERY loaded 486/66 doing about 7000(!) pagefaults a second
> > but 'only' 70 disk operations a second.
>
> i was narrowly defining read-ahead as speculatively reading pages
> that may be needed in the future. read-ahead doesn't necessarily
> mean reading in large blocks, nor does it necessarily mean reading
> sequentially in order to reduce seeks. clustering *causes*
> read-ahead, but also increases the size of read operations. i
> think of clustering and read-ahead as separate, but related,
> concepts. :)

They might be separate, but I will argue that read-ahead
without clustering isn't worth much. Sure, you'll reduce
I/O latency of a single app when you get the data in
advance, but it will still break down when more processes
need access to the disk.

Clustering allows us to minimize the (absurdly expensive)
disk seeks so that we can increase filesystem throughput
to a level that's actually enough to service multiple apps
with minimal latency (if we read the data in advance).

Since these goals are so much alike, I don't think we
should make the (artificial) distinction you are making.

We should, IMHO, take the time to think up a simple
algorithm that approaches both goals without too much
hassles.

regards,

Rik -- Open Source: you deserve to be in control of your data.
+-------------------------------------------------------------------+
| Le Reseau netwerksystemen BV: http://www.reseau.nl/ |
| Linux Memory Management site: http://www.linux.eu.org/Linux-MM/ |
| Nederlandse Linux documentatie: http://www.nl.linux.org/ |
+-------------------------------------------------------------------+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/