Re: [PATCH] vmscan: remove wait_on_page_writeback() from pageout()

From: KOSAKI Motohiro
Date: Wed Jul 28 2010 - 05:43:55 EST


> On Wed, Jul 28, 2010 at 04:46:54PM +0800, Wu Fengguang wrote:
> > The wait_on_page_writeback() call inside pageout() is virtually dead code.
> >
> > shrink_inactive_list()
> > shrink_page_list(PAGEOUT_IO_ASYNC)
> > pageout(PAGEOUT_IO_ASYNC)
> > shrink_page_list(PAGEOUT_IO_SYNC)
> > pageout(PAGEOUT_IO_SYNC)
> >
> > Because shrink_page_list/pageout(PAGEOUT_IO_SYNC) is always called after
> > a preceding shrink_page_list/pageout(PAGEOUT_IO_ASYNC), the first
> > pageout(ASYNC) converts dirty pages into writeback pages, the second
> > shrink_page_list(SYNC) waits on the clean of writeback pages before
> > calling pageout(SYNC). The second shrink_page_list(SYNC) can hardly run
> > into dirty pages for pageout(SYNC) unless in some race conditions.
> >
>
> It's possible for the second call to run into dirty pages as there is a
> congestion_wait() call between the first shrink_page_list() call and the
> second. That's a big window.
>
> > And the wait page-by-page behavior of pageout(SYNC) will lead to very
> > long stall time if running into some range of dirty pages.
>
> True, but this is also lumpy reclaim which is depending on a contiguous
> range of pages. It's better for it to wait on the selected range of pages
> which is known to contain at least one old page than excessively scan and
> reclaim newer pages.

Today, I was successful to reproduce the Andres's issue. and I disagree this
opinion.
The root cause is, congestion_wait() mean "wait until clear io congestion". but
if the system have plenty dirty pages, flusher threads are issueing IO conteniously.
So, io congestion is not cleared long time. eventually, congestion_wait(BLK_RW_ASYNC, HZ/10)
become to equivalent to sleep(HZ/10).

I would propose followint patch instead.

And I've found synchronous lumpy reclaim have more serious problem. I woule like to
explain it as another mail.

Thanks.