[RFC][PATCH v2] writeback: limit number of moved inodes inqueue_io()

From: Wu Fengguang
Date: Fri May 06 2011 - 06:06:59 EST


On Fri, May 06, 2011 at 04:42:38PM +0800, Wu Fengguang wrote:
> > patched trace-tar-dd-ext4-2.6.39-rc3+
>
> > flush-8:0-3048 [004] 1929.981734: writeback_queue_io: bdi 8:0: older=4296600898 age=2 enqueue=13227
>
> > vanilla trace-tar-dd-ext4-2.6.39-rc3
>
> > flush-8:0-2911 [004] 77.158312: writeback_queue_io: bdi 8:0: older=0 age=-1 enqueue=18938
>
> > flush-8:0-2911 [000] 82.461064: writeback_queue_io: bdi 8:0: older=0 age=-1 enqueue=6957
>
> It looks too much to move 13227 and 18938 inodes at once. So I tried
> arbitrarily limiting the max move number to 1000 and it helps reduce
> the lock hold time and contentions a lot.

Oh it seems 1000 is too small at least for this workload, it hurts
dd+tar+sync total elapsed time.

no limit:
avg 167.486
stddev 8.996
limit=1000:
avg 171.222
stddev 5.588
limit=3000:
avg 165.335
stddev 5.503

So use 3000 as the new limit.

Thanks,
Fengguang
---
Subject: writeback: limit number of moved inodes in queue_io()
Date: Fri May 06 13:34:08 CST 2011

Only move 3000 inodes from b_dirty to b_io at one time. This reduces
lock max hold time and lock contentions by many times in a simple dd+tar
workload in a 8p test box. This workload was observed to move 10000+
inodes in one shot on ext4 which was obviously too much.

class name con-bounces contentions waittime-min waittime-max waittime-total acq-b
ounces acquisitions holdtime-min holdtime-max holdtime-total
----------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------
vanilla 2.6.39-rc3:
inode_wb_list_lock: 2063 2065 0.12 2648.66 5948.99
27475 943778 0.09 2704.76 498340.24
------------------
inode_wb_list_lock 89 [<ffffffff8115cf3a>] sync_inode+0x28/0x5f
inode_wb_list_lock 38 [<ffffffff8115ccab>] inode_wait_for_writeback+0xa8/0xc6
inode_wb_list_lock 629 [<ffffffff8115da35>] __mark_inode_dirty+0x170/0x1d0
inode_wb_list_lock 842 [<ffffffff8115d334>] writeback_sb_inodes+0x10f/0x157
------------------
inode_wb_list_lock 891 [<ffffffff8115ce3e>] writeback_single_inode+0x175/0x249
inode_wb_list_lock 13 [<ffffffff8115dc4e>] writeback_inodes_wb+0x3a/0x143
inode_wb_list_lock 499 [<ffffffff8115da35>] __mark_inode_dirty+0x170/0x1d0
inode_wb_list_lock 617 [<ffffffff8115d334>] writeback_sb_inodes+0x10f/0x157

limit=1000:

dd+tar+sync total elapsed time (10 runs):
avg 171.222
stddev 5.588

&(&wb->list_lock)->rlock: 842 842 0.14 101.10 1013.34
20489 970892 0.09 234.11 509829.79
------------------------
&(&wb->list_lock)->rlock 275 [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf
&(&wb->list_lock)->rlock 114 [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e
&(&wb->list_lock)->rlock 56 [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
&(&wb->list_lock)->rlock 132 [<ffffffff8115cf2a>] sync_inode+0x63/0xa2
------------------------
&(&wb->list_lock)->rlock 2 [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85
&(&wb->list_lock)->rlock 33 [<ffffffff8115cf2a>] sync_inode+0x63/0xa2
&(&wb->list_lock)->rlock 9 [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
&(&wb->list_lock)->rlock 430 [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e

limit=3000:

dd+tar+sync total elapsed time (10 runs):
avg 165.335
stddev 5.503

&(&wb->list_lock)->rlock: 1088 1092 0.11 245.08 3268.75
21124 1718636 0.09 384.53 849827.20
------------------------
&(&wb->list_lock)->rlock 518 [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf
&(&wb->list_lock)->rlock 3 [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85
&(&wb->list_lock)->rlock 54 [<ffffffff8115cf2a>] sync_inode+0x63/0xa2
&(&wb->list_lock)->rlock 10 [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
------------------------
&(&wb->list_lock)->rlock 4 [<ffffffff8115dfea>] inode_wb_list_del+0x5f/0x85
&(&wb->list_lock)->rlock 379 [<ffffffff8115db09>] __mark_inode_dirty+0x173/0x1cf
&(&wb->list_lock)->rlock 4 [<ffffffff8115cc29>] inode_wait_for_writeback+0xac/0xcc
&(&wb->list_lock)->rlock 446 [<ffffffff8115cdd3>] writeback_single_inode+0x18a/0x27e

Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx>
---
fs/fs-writeback.c | 2 ++
1 file changed, 2 insertions(+)

--- linux-next.orig/fs/fs-writeback.c 2011-05-06 13:32:41.000000000 +0800
+++ linux-next/fs/fs-writeback.c 2011-05-06 16:44:58.000000000 +0800
@@ -279,6 +279,8 @@ static int move_expired_inodes(struct li
sb = inode->i_sb;
list_move(&inode->i_wb_list, &tmp);
moved++;
+ if (unlikely(moved >= 3000)) /* limit spinlock hold time */
+ break;
}

/* just one sb in list, splice to dispatch_queue and we're done */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/