Re: [PATCH 1/7] block: Add block_flush_device()

From: Jeff Garzik
Date: Mon Mar 30 2009 - 15:18:33 EST

Next message: Jeff Mahoney: "Re: [patch 32/35 code-cleanup] reiserfs: rename p_s_inode to inode"
Previous message: Jesse Barnes: "Re: [PATCH] pci: don't enable too much HT MSI mapping -v6"
In reply to: Jens Axboe: "Re: [PATCH 1/7] block: Add block_flush_device()"
Next in thread: Chris Mason: "Re: [PATCH 1/7] block: Add block_flush_device()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Jens Axboe wrote:

On Mon, Mar 30 2009, Linus Torvalds wrote:

On Mon, 30 Mar 2009, Jens Axboe wrote:
The problem is that we may not know upfront, so it sort-of has to be
this trial approach where the first barrier issued will notice and fail
with -EOPNOTSUPP.

Well, absolutely. Except I don't think you shoul use ENOTSUPP, you should just set a bit in the "struct request_queue", and then return 0.

IOW, something like this

--- a/block/blk-barrier.c
+++ b/block/blk-barrier.c
@@ -318,6 +318,9 @@ int blkdev_issue_flush(struct block_device *bdev, sector_t *error_sector)
if (!q)
return -ENXIO;
+ if (is_queue_noflush(q))
+ return 0;
+
bio = bio_alloc(GFP_KERNEL, 0);
if (!bio)
return -ENOMEM;
@@ -339,7 +342,7 @@ int blkdev_issue_flush(struct block_device *bdev, sector_t *error_sector)
ret = 0;
if (bio_flagged(bio, BIO_EOPNOTSUPP))
- ret = -EOPNOTSUPP;
+ set_queue_noflush(q);
else if (!bio_flagged(bio, BIO_UPTODATE))
ret = -EIO;

which just returns 0 if we don't support flushing on that queue.

(Obviously incomplete patch, which is why I also intentionally whitespace-broke it).

Sure, we could cache this value, but it's pretty
pointless since the filesystem will stop sending barriers in this case.

Well no, it won't. Or rather, it will have to have such a stupid per-filesystem flag, for no good reason.

Sorry, I just don't see much point to doing it this way instead. So now
the fs will have to check a queue bit after it has issued the flush, how
is that any better than having the 'error' returned directly?

AFAICS, the aim is simply to return zero rather than EOPNOTSUPP, for the not-supported case, rather than burdening all callers with such checks.

Which is quite reasonable for Fernando's patch -- the direct call fsync case.

But that leaves open the possibility that some people really do want the EOPNOTSUPP return value, I guess? Do existing callers need that?

For blkdev_issue_flush() it may not be very interesting, since there's
not much we can do about that. Just seems like very bad style to NOT
return an error in such a case. You can assume that ordering is fine,
but it definitely wont be in all case (eg devices that have write back
caching on by default and don't support flush).

So?

The thing is, you can't _do_ anything about it. So what's the point in returning an error? The caller cannot possibly care - because there is nothing the caller can really do.

Not for blkdev_issue_flush(), all they can do is report about the
device. And even that would be a vague "Your data may or may not be
safe, we don't know".

Sure, the device may or may not re-order things, but since the caller can't know, and can't really do a thing about it _anyway_, you're just better off not even confusing anybody.

I'd call that a pretty reckless approach to data integrity, honestly.
You HAVE to issue an error in this case. Then the user/admin can at least
check up on the device stack in question, and determine whether this is
an issue or not. That goes for both blkdev_issue_flush() and the actual
barrier write. And perhaps the cached value is then of some use, since
you then know when to warn (bit not already set) and you can keep the
warning in blkdev_issue_flush() instead of putting it in every call
site.

Indeed -- if the drive tells us it failed the cache flush, it seems self-evident that we should be passing that failure back to userspace where possible.

And as the patches show, it is definitely possible to return a FLUSH CACHE error back to an fsync(2) caller [though, yes, I certainly recognize fsync is not the only generator of these requests].

Jeff

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jeff Mahoney: "Re: [patch 32/35 code-cleanup] reiserfs: rename p_s_inode to inode"
Previous message: Jesse Barnes: "Re: [PATCH] pci: don't enable too much HT MSI mapping -v6"
In reply to: Jens Axboe: "Re: [PATCH 1/7] block: Add block_flush_device()"
Next in thread: Chris Mason: "Re: [PATCH 1/7] block: Add block_flush_device()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]