Re: [GIT PULL] Ext3 latency fixes

From: Linus Torvalds
Date: Sun Apr 05 2009 - 13:05:29 EST

Next message: Paul E . McKenney: "[PATCH] Make idle loop (mostly) safe for RCU read-side critical sections."
Previous message: Andi Kleen: "Re: [PATCH -tip] Convert CONFIG_SMP=y powerpc defconfigs to TREE_RCU."
In reply to: Arjan van de Ven: "Re: [GIT PULL] Ext3 latency fixes"
Next in thread: Mark Lord: "Re: [GIT PULL] Ext3 latency fixes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sat, 4 Apr 2009, Theodore Tso wrote:
>
> It may be that it's easier from an less-lines-of-the-kernels-to-change
> point of view to add a WRITE_SYNC_PLUGGED which doesn't do the unplug,
> and keep WRITE_SNYC as having the implicit unplug. Or maybe it would
> be better to completely separate the "send a write which is marked as
> SYNC" from the "implicit unplug" in the API.

Well, the block layer internally already separates the two, it's just
WRITE_SYNC that ties them together. So a trivial patch like the appended
would make WRITE_SYNC just mean "this is synchronous IO" without the
unplugging, and then WRITE_UNPLUG would mark it both as synchronous _and_
unplug it.

(Or you could do the WRITE_SYNC_PLUGGED model too, but I think WRITE_SYNC
and WRITE_UNPLUG actually would be better names, since they talk about the
effects more directly).

The question is more of a "who really wants the unplugging". Presumably
the direct-io code-path really does (on the assumption that if you are
doing direct-io, the caller had better done all the coalescing it wants).

Non-rotational media would tend to want the unplugging, but the block
layer already does that (ie in __make_request() we already do:

if (unplug || blk_queue_nonrot(q))
__generic_unplug_device(q);

so non-rotational devices get unplugged whether the request was a
unplugging request or not).

Of course, different IO schedulers react differently to that whole "sync
vs unplug" thing. I think CFQ is the only one that actually cares about
the "sync" bit (using different queues for sync vs async). The other
schedulers only care about the plugging. So the patch below really doesn't
make much sense as-is, because as things are right now, the scheduler
behaviors are so different for the unplug-vs-sync thing that no sane user
can ever know whether they should use WRITE_SYNC (== higher priority
queueing for CFQ, no-op for others) or WRITE_UNPLUG (unplug on all, and
additionally higher priority for CFQ).

So I'm not serious about this patch, because as things are right now, I
don't think it's sensible, but I do think that this just generally shows
the confusion of what we should do. When is plugging right, when isn't it?

Also, one of the issues seems to literally be that the higher-level
request handling doesn't care AT ALL about the priority. Allocating the
request itself does keep reads and writes separated, but if the request is
a SYNCIO request, and non-sync writes have filled up th write requests,
we'll have to wait for the background writes to free up requests.

That is quite possibly the longest wait we have in the system.

See get_request():

const int rw = rw_flags & 0x01;

(That should _really_ be RW_MASK instead of 0x01, I suspect) and how the
request allocation itself is done.

I think this is broken.

Linus

---
include/linux/fs.h | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index a09e17c..a144377 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -95,7 +95,8 @@ struct inodes_stat_t {
#define SWRITE 3 /* for ll_rw_block() - wait for buffer lock */
#define READ_SYNC (READ | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_UNPLUG))
#define READ_META (READ | (1 << BIO_RW_META))
-#define WRITE_SYNC (WRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_UNPLUG))
+#define WRITE_SYNC (WRITE | (1 << BIO_RW_SYNCIO))
+#define WRITE_UNPLUG (WRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_UNPLUG))
#define SWRITE_SYNC (SWRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_UNPLUG))
#define WRITE_BARRIER (WRITE | (1 << BIO_RW_BARRIER))
#define DISCARD_NOBARRIER (1 << BIO_RW_DISCARD)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Paul E . McKenney: "[PATCH] Make idle loop (mostly) safe for RCU read-side critical sections."
Previous message: Andi Kleen: "Re: [PATCH -tip] Convert CONFIG_SMP=y powerpc defconfigs to TREE_RCU."
In reply to: Arjan van de Ven: "Re: [GIT PULL] Ext3 latency fixes"
Next in thread: Mark Lord: "Re: [GIT PULL] Ext3 latency fixes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]