Re: (reiserfs) Re: More on 2.2.18pre2aa2

From: James Sutherland (jas88@cam.ac.uk)
Date: Thu Sep 14 2000 - 02:38:06 EST

Next message: Trond Myklebust: "Re: NFS locking bug -- limited mtime resolution means nfs_lock() does not provide coherency guarantee"
Previous message: Matthew Delco: "ISSUE: Linux bug/restriction prevents multicast participation"
In reply to: Rik van Riel: "Re: (reiserfs) Re: More on 2.2.18pre2aa2"
Next in thread: Hans Reiser: "Re: (reiserfs) Re: More on 2.2.18pre2aa2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, 13 Sep 2000, Rik van Riel wrote:

> On Thu, 14 Sep 2000, Ragnar Kjørstad wrote:
> This (IMHO) is wrong. When the drive has trouble keeping up
> with requests in FCFS order, we should do some elevator sorting
> to keep throughput high enough to deal with the requests we get.

Reversing that, you mean the drive should be given requests on an FCFS
basis until it is running flat out, at which point elevator sorting should
come into play? That's what I would suggest: under light loads, requests
are then handled with no extra latency at all, but we still get increased
throughput under heavier load.

> Jeff's idea would take care of this, together with a *limited*
> (say, 32 entries) queue size. Every time the "work" queue (the
> one the disk is working on) is empty, we switch queues.

Then process one queue until it's empty, even if there are higher priority
requests in another queue? I'm not so sure about that: then my backup
program or a file copy will be getting between my MP3 player and the disk.

> Processes get to put their request on the other queue, either
> until the disk takes their requests (the queues are switched)
> or until the queue is full. That way only the N highest priority
> processes can get access to the disk if we're really overloaded
> in a bad way.

How does your approach block lower priority processes? If they were to
issue a request before a higher priority one, it is inserted into the
"filling" queue; once we switch queues, this request will be handled -
perhaps before a higher priority request, if we're unlucky with the
timing?

If we can keep it sorted reasonably efficiently, a single rolling queue
seems better - or perhaps model it on the process scheduler? Have
"realtime" requests ("it doesn't matter if everything else locks up, this
request MUST be completed ASAP"), then "other" requests of various
priorities?

> The current HUGE queue size is probably another reason for
> the very bad latencies we sometimes see...

If the code ever sits there sorting the queue out, while the disk goes
idle, the code needs adjusting: getting requests processed in a
sub-optimal order is still better than not getting any requests processed
at all for some period of time!

> > > (maybe with the twist that we /do/ allow merges on the queue
> > > that's being processed by the disk, but no insertions of new
> > > requests)
> >
> > I don't think you should allow merges. If one process is doing a big
> > IO operation on a big file it would still get _all_ capacity, right?
>
> Only if you were to allow unlimited merges ;)
>
> I guess a 'decent' maximum size for one request would
> be somewhere around 256kB, since this is the amount of
> data a drive can read in the time of one disk seek...

That's one approach; I prefer my "weighted scoring" approach. Supposing we
have three devices: a solid state disk (instant "seeks"), a hard drive and
a tape. The SSD will benefit from merges (fewer commands to process), but
not hugely - set both the metrics at 1, so a 64Kb request is just under
twice the "cost" of a 32Kb one. The hard drive can seek fairly quickly,
but long reads are preferable - say, seek_cost = 16, block_cost = 1. A
single 64Kb request is "cheaper" than a pair of 32Kb requests, but not
hugely so. Then the tape: seeks can take a few minutes, so make it
seek_cost = 65536, block_cost = 1: we don't really care how much is being
read, within reason, since seeks are so "expensive".

In short, having a single rolling queue (probably divided into priority
classes) seems preferable; we should also bypass the elevator entirely if
the device is able to accept new commands at the time. (Q: how
intelligently will drives which handle command queueing themselves be??)

James.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

Next message: Trond Myklebust: "Re: NFS locking bug -- limited mtime resolution means nfs_lock() does not provide coherency guarantee"
Previous message: Matthew Delco: "ISSUE: Linux bug/restriction prevents multicast participation"
In reply to: Rik van Riel: "Re: (reiserfs) Re: More on 2.2.18pre2aa2"
Next in thread: Hans Reiser: "Re: (reiserfs) Re: More on 2.2.18pre2aa2"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Fri Sep 15 2000 - 21:00:23 EST