Re: [PATCH/RFC] Simplified Readahead

From: Andrew Morton
Date: Mon Sep 27 2004 - 14:29:54 EST


Steven Pratt <slpratt@xxxxxxxxxxxxxx> wrote:
>
> >yup. POSIX_FADV_WILLNEED should just populate pagecache and should launch
> >asynchronous I/O.
> >
>
> Well then this could cause problems (other than congestion) on both the
> current and new code since both will effectivly see 2 reads, the second
> of which may appear to be a seek backwards thus confusing the code
> slightly. Would it be best to just special case the POSIX_FADV_WILLNEED
> and issue the I/O required (via do_page_cache_readahead) without
> updating any of the window or current page offset information ?

That's what we do at present. do_page_cache_readahead() and
force_page_cache_readahead() are low-level functions which do not affect
file->ra_state.

Except whoops. POSIX_FADV_WILLNEED is using force_page_cache_readahead(),
which bypasses the congested check. Wonder how that happened.

<digs out the changeset>

ChangeSet 1.1046.589.14 2003/08/01 10:02:32 akpm@xxxxxxxx
[PATCH] rework readahead for congested queues

Since Jens changed the block layer to fail readahead if the queue has no
requests free, a few changes suggest themselves.

- It's a bit silly to go and alocate a bunch of pages, build BIOs for them,
submit the IO only to have it fail, forcing us to free the pages again.

So the patch changes do_page_cache_readahead() to peek at the queue's
read_congested state. If the queue is read-congested we abandon the entire
readahead up-front without doing all that work.

- If the queue is not read-congested, we go ahead and do the readahead,
after having set PF_READAHEAD.

The backing_dev_info's read-congested threshold cuts in when 7/8ths of
the queue's requests are in flight, so it is probable that the readahead
abandonment code in __make_request will now almost never trigger.

- The above changes make do_page_cache_readahead() "unreliable", in that it
may do nothing at all.

However there are some system calls:

- fadvise(POSIX_FADV_WILLNEED)
- madvise(MADV_WILLNEED)
- sys_readahead()

In which the user has an expectation that the kernel will actually
perform the IO.

So the patch creates a new "force_page_cache_readahead()" which will
perform the IO regardless of the queue's congestion state.

Arguably, this is the wrong thing to do: even though the application
requested readahead it could be that the kernel _should_ abandon the user's
request because the disk is so busy.

I don't know. But for now, let's keep the above syscalls behaviour
unchanged. It is trivial to switch back to do_page_cache_readahead()
later.


So there we have it. The reason why normal readahead skips congested
queues is because the block layer will drop the I/O on the floor *anyway*
because it also skips congested queues for readahead I/O requests.

And fadvise() was switched to force_page_cache_readahead() because that was
the old behaviour.

But PF_READAHEAD isn't there any more, and BIO_RW_AHEAD and BIO_RW_BLOCK
are not used anywhere, so we broke that. Jens, do you remember what
happened there?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/