[PATCH] ext4: Fix use of write barrier in commit logic

From: Theodore Ts'o
Date: Mon May 19 2008 - 22:40:52 EST


We were previously using set_buffer_ordered() and then calling
submit_bh() to write the commit block, which resulted in this flush
behavior:

write log blocks
blkdev_issue_flush() // flush #1
write commit block
blkdev_issue_flush() // flush #2
write metadata blocks

And if the journal_async_commit mount option was used, the commit was
written without using set_buffer_ordered(), which resulted in no
flushes happening at all.

Change the commit code to use blkdev_issue_flush() explicitly, and to
omit flush #1 if journal_async_commit is enabled, but to always
perform the second flush if write barriers are enabled.

This change should hopefully reduce the overhead of using write
barriers when an ext4 filesystem is mounted using
"barrier=1,journal_async_commit", while still maintaining filesystem
safety. If this works out, we should probably make this the default
mode for ext4.

Signed-off-by: "Theodore Ts'o" <tytso@xxxxxxx>
---
fs/jbd2/commit.c | 44 +++++++++++++++++---------------------------
1 files changed, 17 insertions(+), 27 deletions(-)

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 3041d75..3fe5be5 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -22,6 +22,7 @@
#include <linux/pagemap.h>
#include <linux/jiffies.h>
#include <linux/crc32.h>
+#include <linux/blkdev.h>

/*
* Default IO end handler for temporary BJ_IO buffer_heads.
@@ -111,7 +112,6 @@ static int journal_submit_commit_record(journal_t *journal,
struct commit_header *tmp;
struct buffer_head *bh;
int ret;
- int barrier_done = 0;
struct timespec now = current_kernel_time();

if (is_journal_aborted(journal))
@@ -147,34 +147,24 @@ static int journal_submit_commit_record(journal_t *journal,
if (journal->j_flags & JBD2_BARRIER &&
!JBD2_HAS_INCOMPAT_FEATURE(journal,
JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT)) {
- set_buffer_ordered(bh);
- barrier_done = 1;
+ ret = blkdev_issue_flush(bh->b_bdev, NULL);
+ if (ret == -EOPNOTSUPP) {
+ char b[BDEVNAME_SIZE];
+
+ printk(KERN_WARNING
+ "JBD: barrier-based sync failed on %s - "
+ "disabling barriers\n",
+ bdevname(journal->j_dev, b));
+ spin_lock(&journal->j_state_lock);
+ journal->j_flags &= ~JBD2_BARRIER;
+ spin_unlock(&journal->j_state_lock);
+ }
}
ret = submit_bh(WRITE, bh);
- if (barrier_done)
- clear_buffer_ordered(bh);

- /* is it possible for another commit to fail at roughly
- * the same time as this one? If so, we don't want to
- * trust the barrier flag in the super, but instead want
- * to remember if we sent a barrier request
- */
- if (ret == -EOPNOTSUPP && barrier_done) {
- char b[BDEVNAME_SIZE];
-
- printk(KERN_WARNING
- "JBD: barrier-based sync failed on %s - "
- "disabling barriers\n",
- bdevname(journal->j_dev, b));
- spin_lock(&journal->j_state_lock);
- journal->j_flags &= ~JBD2_BARRIER;
- spin_unlock(&journal->j_state_lock);
+ if (!ret && journal->j_flags & JBD2_BARRIER)
+ ret = blkdev_issue_flush(bh->b_bdev, NULL);

- /* And try again, without the barrier */
- set_buffer_uptodate(bh);
- set_buffer_dirty(bh);
- ret = submit_bh(WRITE, bh);
- }
*cbh = bh;
return ret;
}
--
1.5.4.1.144.gdfee-dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/