Re: [PATCH 00/18] IO-less dirty throttling v11

From: Trond Myklebust
Date: Wed Sep 07 2011 - 15:15:02 EST


On Wed, 2011-09-07 at 21:32 +0800, Wu Fengguang wrote:
> > Finally, the complete IO-less balance_dirty_pages(). NFS is observed to perform
> > better or worse depending on the memory size. Otherwise the added patches can
> > address all known regressions.
>
> I find that the NFS performance regressions on large memory system can
> be fixed by this patch. It tries to make the progress more smooth by
> reasonably reducing the commit size.
>
> Thanks,
> Fengguang
> ---
> Subject: nfs: limit the commit size to reduce fluctuations
> Date: Thu Dec 16 13:22:43 CST 2010
>
> Limit the commit size to half the dirty control scope, so that the
> arrival of one commit will not knock the overall dirty pages off the
> scope.
>
> Also limit the commit size to one second worth of data. This will
> obviously help make the pipeline run more smoothly.
>
> Also change "<=" to "<": if an inode has only one dirty page in the end,
> it should be committed. I wonder why the "<=" didn't cause a bug...
>
> CC: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx>
> Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>
> ---
> fs/nfs/write.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> After patch, there are still drop offs from the control scope,
>
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/NFS/nfs-1dd-1M-8p-2945M-20%25-2.6.38-rc6-dt6+-2011-02-22-21-09/balance_dirty_pages-pages.png
>
> due to bursty arrival of commits:
>
> http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/NFS/nfs-1dd-1M-8p-2945M-20%25-2.6.38-rc6-dt6+-2011-02-22-21-09/nfs-commit.png
>
> --- linux-next.orig/fs/nfs/write.c 2011-09-07 21:29:15.000000000 +0800
> +++ linux-next/fs/nfs/write.c 2011-09-07 21:29:32.000000000 +0800
> @@ -1543,10 +1543,14 @@ static int nfs_commit_unstable_pages(str
> int ret = 0;
>
> if (wbc->sync_mode == WB_SYNC_NONE) {
> + unsigned long bw = MIN_WRITEBACK_PAGES +
> + NFS_SERVER(inode)->backing_dev_info.avg_write_bandwidth;
> +
> /* Don't commit yet if this is a non-blocking flush and there
> - * are a lot of outstanding writes for this mapping.
> + * are a lot of outstanding writes for this mapping, until
> + * collected enough pages to commit.
> */
> - if (nfsi->ncommit <= (nfsi->npages >> 1))
> + if (nfsi->ncommit < min(nfsi->npages / DIRTY_SCOPE, bw))
> goto out_mark_dirty;
>
> /* don't wait for the COMMIT response */

So what goes into the 'avg_write_bandwidth' variable that makes it a
good measure above (why 1 second of data instead of 10 seconds or
1ms, ...)? What is the 'DIRTY_SCOPE' value?

IOW: what new black magic are we introducing above and why is it so
obviously better than what we have (yes, I see you have graphs, but that
is just measuring _one_ NFS setup and workload).

--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/