[PATCH] jbd2: avoid the concurrent data writeback

From: Feng Tang
Date: Mon Nov 15 2010 - 08:06:44 EST


When dd a big file to an ext4 partition, it is very likely to happen
that both the background flush thread and kjounald try to do data
writeback for it, that the flush thread is doing the writeback for
this file and jbd2 thread are also waken up to commit the transaction.
Because kjounald only calls the generic_writepages() whose path
doesn't really allocate disk blocks, the ext4_witepage() may be called
lots of times (100000+ for a 1g file dd) without really writing one page
back (skipped), which will consume lots of unnecessary CPU time

This could be found by a simple test case with ftrace:
$ sync;
$ echo 40960 > buffer_size_kb;echo 1 > events/writeback/enable;echo 1 > events/jbd2/enable;echo 1 > events/ext4/enable;
$ dd if=/dev/zero of=/home/test/1g.bin bs=1M count=1024;sync;
$ cat trace > /home/test/jbd2_ext4_1g_dd.log
$ grep -c wcb_writepage /home/test/jbd2_ext4_1g_dd.log

This patch will check if the inode is under data syncing, if yes then
don't start the writeback from kjournald

The Perf statics (On my Core Duo 2 + 4G RAM + SATA disk + Ext4 in all default modes):
before the patch > 112191 writeback:wbc_writepage # 0.005 M/sec
after the patch > 54 writeback:wbc_writepage # 0.000 M/sec

Signed-off-by: Feng Tang <feng.tang@xxxxxxxxx>
---
fs/jbd2/commit.c | 11 +++++++++++
1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index f3ad159..0f3e356 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -170,6 +170,10 @@ static int journal_wait_on_commit_record(journal_t *journal,
* We don't do block allocation here even for delalloc. We don't
* use writepages() because with dealyed allocation we may be doing
* block allocation in writepages().
+ *
+ * Sometimes when this get called, the host inode may be under data
+ * syncing initiated by flush thread(especially for a large file), and
+ * in such situation, we should skip this path of writeback
*/
static int journal_submit_inode_data_buffers(struct address_space *mapping)
{
@@ -181,6 +185,13 @@ static int journal_submit_inode_data_buffers(struct address_space *mapping)
.range_end = i_size_read(mapping->host),
};

+ spin_lock(&inode_lock);
+ if (mapping->host->i_state & I_SYNC) {
+ spin_unlock(&inode_lock);
+ return 0;
+ }
+ spin_unlock(&inode_lock);
+
ret = generic_writepages(mapping, &wbc);
return ret;
}
--
1.6.3.3
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/