Re: How to flush the disk write cache from userspace

From: Jens Axboe
Date: Mon Jan 15 2007 - 19:38:35 EST


On Sun, Jan 14 2007, Ricardo Correia wrote:
> Hi, (please CC: to my email address, I'm not subscribed)
>
> Quick question: how can I flush the disk write cache from userspace?
>
> Long question:
>
> I'm porting the Solaris ZFS filesystem to the FUSE/Linux filesystem
> framework. This is a copy-on-write, transactional filesystem and so
> it needs to ensure correct ordering of writes when transactions are
> written to disk.
>
> At the moment, when transactions end, I'm using a fsync() on the block
> device followed by a ioctl(BLKFLSBUF).
>
> This is because, according to the fsync manpage, even after fsync()
> returns, data might still be in the disk write cache, so fsync by
> itself doesn't guarantee data safety on power failure.

Depends. Only if the file system does the right thing here, iirc only
reiserfs with barriers enabled issue a real disk flush for fsync. So you
can't rely on it in general.

> I was looking for something like the Solaris
> ioctl(DKIOCFLUSHWRITECACHE), which does exactly what I need.
>
> The most similar thing I could find was ioctl(BLKFLSBUF), however a
> search for BLKFLSBUF on the Linux 2.6.15 source doesn't seem to return
> anything related to IDE or SCSI disks.
>
> Can I trust ioctl(BLKFLSBUF) to flush disks' write caches (for disks
> that follow the specs)?

BLKFLSBUF doesn't flush the disk cache either, it just flushes
every dirty page in the block device address space. It would not be very
hard to do, basically we have most of the support code in place for this
for IO barriers. Basically it would be something like:

blockdev_cache_flush(bdev)
{
request_queue_t *q = bdev_get_queue(bdev);
struct request *rq = blk_get_request(q, WRITE, GFP_WHATEVER);
int ret;

ret = blk_execute_rq(q, bdev->bd_disk, rq, 0);
blk_put_request(rq);
return ret;
}

Somewhat simplified of course, but it should get the point across.
Putting that in fs/buffer.c:sync_blockdev() would make BLKFLSBUF work.

As always with these things, the devil is in the details. It requires
the device to support a ->prepare_flush() queue hook, and not all
devices do that. It will work for IDE/SATA/SCSI, though. In some devices
you don't want/need to do a real disk flush, it depends on the write
cache settings, battery backing, etc.

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/