Re: [PATCH] notes on volatile write caches vs fdatasync

From: Christoph Hellwig
Date: Wed Aug 26 2009 - 21:20:26 EST

No actually a patch, sorry ;-)

On Thu, Aug 27, 2009 at 03:16:24AM +0200, Christoph Hellwig wrote:
> There are two related issues when dealing with volatile write caches,
> the popular and beaten to death one are write barriers to guarantee
> write ordering and stable storage for log writes. For this post
> I assume naively this works perfectly for all filesystems supporting it.
> The second issue are plain cache flush. Yes, they happen to be the
> base for the barrier implementation on all common disks in Linux, but
> there are cases where we need to issue them even without a log barrier.
> Think about a plain write into a file that is already fully allocated.
> Or the O_DIRECT version of them same. If we do an fdatasync after these
> we really do expect the write to really be on disk, not just in the disk
> cache, right? The same is true for O_SYNC, but I ignore it for this
> write out as with Jan's patch series O_SYNC writes will be implemented
> by a range-fdatasync after the actual write, so after that this sync
> section covers it, too.
> It appears the following Linux filesystems implement barrier support:
> - btrfs
> - ext3
> - ext4
> - gfs2
> - nilfs2
> - ocfs2
> - reiserfs
> - xfs
> Interestingly of those only ext4, reiserfs and xfs do contain direct
> calls to blkdev_issue_flush. And unless a filesystem really creates
> a transaction for every write and forces that out on fdatasync it seems
> like all others do not actually have a chance to guarantee a cache
> flush on fdatasync.
> I have tested btrfs, ext3, ext4, reiserfs, and xfs with a simple test
> program that just does a buffered write into a file, and then calls
> fdatasync. All of the above filesystems issue a barrier request
> when the file blocks aren't allocated yet (for ext3 and reiserfs
> only when barriers are explicitly enabled, of course).
> That's not the case anymore when all blocks are already allocated.
> As expected by the above grep results reiserfs and xfs still issue a
> barrier in that case. Btrfs also performs a cache flush in every
> case which at first seems unexpected due to the lack of any
> blkdev_issue_flush call, but given that btrfs is a COW filesystem
> it actually has to allocate blocks even for an overwrite.
> Ext3 expectedly does not issue a cache flush in that case, but ext4
> unexpectedly does not issue a cache flush either. The reason for that
> is that it only issues the cache flush if the inode was dirty but
> not at all if that is not the case.
---end quoted text---
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at