Re: [PATCH] bdi_sync_writeback should WB_SYNC_NONE first

From: Chris Mason
Date: Mon Sep 28 2009 - 09:33:12 EST


On Sun, Sep 27, 2009 at 01:34:58AM -0700, Andrew Morton wrote:
> On Fri, 25 Sep 2009 10:10:14 -0400 Chris Mason <chris.mason@xxxxxxxxxx> wrote:
>
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index 8e1e5e1..27f8e0e 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -225,7 +225,7 @@ static void bdi_sync_writeback(struct backing_dev_info *bdi,
> > {
> > struct wb_writeback_args args = {
> > .sb = sb,
> > - .sync_mode = WB_SYNC_ALL,
> > + .sync_mode = WB_SYNC_NONE,
> > .nr_pages = LONG_MAX,
> > .range_cyclic = 0,
> > };
> > @@ -236,6 +236,13 @@ static void bdi_sync_writeback(struct backing_dev_info *bdi,
> >
> > bdi_queue_work(bdi, &work);
> > bdi_wait_on_work_clear(&work);
> > +
> > + args.sync_mode = WB_SYNC_ALL;
> > + args.nr_pages = LONG_MAX;
> > +
> > + work.state = WS_USED | WS_ONSTACK;
> > + bdi_queue_work(bdi, &work);
> > + bdi_wait_on_work_clear(&work);
> > }
>
> Those LONG_MAX's are a worry. What prevents a very long
> almost-livelock from occurring if userspace is concurrently dirtying
> pagecache at a high rate?
>

In this case, we should be called from unmount. But, Jens tells me my
patch isn't quite right because even without my patch, the WB_SYNC_ALL
run is queued onto the tail of the list after the WB_SYNC_NONE run.

So, I need to trace it a little better. My initial theory was that the
nr_dirty number done by the first WB_SYNC_NONE run wasn't big enough.
Once btrfs writepage kicks in, it can make more dirty metadata pages to
close out the delalloc, and if those get written first things could exit
before all the data pages are on disk.

Its a theory anyway, I'll dig in more.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/