Re: [PATCH] writeback: remove unnecessary wait inthrottle_vm_writeout()

From: Andrew Morton
Date: Thu Sep 27 2007 - 16:48:00 EST


On Thu, 27 Sep 2007 09:50:16 +0800
Fengguang Wu <wfg@xxxxxxxxxxxxxxxx> wrote:

> We don't want to introduce pointless delays in throttle_vm_writeout()
> when the writeback limits are not yet exceeded, do we?
>
> Cc: Nick Piggin <nickpiggin@xxxxxxxxxxxx>
> Cc: OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>
> Cc: Kumar Gala <galak@xxxxxxxxxxxxxxxxxxx>
> Cc: Pete Zaitcev <zaitcev@xxxxxxxxxx>
> Cc: Greg KH <greg@xxxxxxxxx>
> Signed-off-by: Fengguang Wu <wfg@xxxxxxxxxxxxxxxx>
> ---
> mm/page-writeback.c | 18 ++++++++----------
> 1 file changed, 8 insertions(+), 10 deletions(-)
>
> --- linux-2.6.23-rc8-mm1.orig/mm/page-writeback.c
> +++ linux-2.6.23-rc8-mm1/mm/page-writeback.c
> @@ -507,16 +507,6 @@ void throttle_vm_writeout(gfp_t gfp_mask
> long background_thresh;
> long dirty_thresh;
>
> - if ((gfp_mask & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO)) {
> - /*
> - * The caller might hold locks which can prevent IO completion
> - * or progress in the filesystem. So we cannot just sit here
> - * waiting for IO to complete.
> - */
> - congestion_wait(WRITE, HZ/10);
> - return;
> - }
> -
> for ( ; ; ) {
> get_dirty_limits(&background_thresh, &dirty_thresh, NULL, NULL);
>
> @@ -530,6 +520,14 @@ void throttle_vm_writeout(gfp_t gfp_mask
> global_page_state(NR_WRITEBACK) <= dirty_thresh)
> break;
> congestion_wait(WRITE, HZ/10);
> +
> + /*
> + * The caller might hold locks which can prevent IO completion
> + * or progress in the filesystem. So we cannot just sit here
> + * waiting for IO to complete.
> + */
> + if ((gfp_mask & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO))
> + break;
> }
> }
>

This is a pretty major bugfix.

GFP_NOIO and GFP_NOFS callers should have been spending really large
amounts of time stuck in that sleep.

I wonder why nobody noticed this happening. Either a) it turns out that
kswapd is doing a good job and such callers don't do direct reclaim much or
b) nobody is doing any in-depth kernel instrumentation.

Now, how _would_ one notice this problem? We don't have very good tools,
really. Booting with "profile=sleep" and looking at the profile data would
be one way. Repeatedly doing sysrq-T is another. Perhaps the new
lockstat-via-lockdep code would allow this to be observed in some fashion,
dunno.

Anyway, this patch has the potential to significantly alter the dynamics of
the VM behaviour under particular workloads. It might turn up other
stuff...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/