[PATCH] block/elevator updates + deadline i/o scheduler

From: Jens Axboe (axboe@suse.de)
Date: Fri Jul 26 2002 - 07:02:48 EST


Hi,

The 2nd version of the deadline i/o scheduler is ready for some testing.
First patch make various changes to the elevator/block layer, that are
needed for the new i/o scheduler. Most of these are just correcting
assumptions about q->queue_head being the i/o scheduler sort list. In
detail:

o elevator_merge_requests_fn takes queue argument as well

o __make_request() inits insert_here to NULL instead of
  q->queue_head.prev, which means that the i/o schedulers must
  explicitly check for this condition now.

o incorporate elv_queue_empty(), it was just a place holder before

o add elv_get_sort_head(). it returns the sort head of the elevator for
  a given request. attempt_{back,front}_merge uses it to determine
  whether a request is valid or not. Maybe attempt_{back,front}_merge
  should just be killed, I doubt they have much relevance with the wake
  up batching.

o call the merge_cleanup functions of the elevator _after_ the merge has
  been done, not before. This way the elevator functions get the new
  state of the request, which is the most interesting.

o Kill extra nr_sectors check in ll_merge_requests_fn()

o bi_bi_bdev is always set in __make_request(), so kill check.

Once that is out of the way, we can plop the new i/o scheduler in. It's
not really ready for inclusion yet, due to a few 'issues':

o Barrier requests are not handled properly right now. This isn't a
  problem in the current kernel, but could be in the future.

o hard coded values

o hash probably needs a bit of work :-)

That said, how does it work? Well, the current Linux i/o schedulers use
q->queue_head as the main queue for both dispatch, sorting, and merging.
Pending requests are on that queue until the driver takes them off by
doing

        rq = elv_next_request(q);
        blkdev_dequeue_request(rq);
        /* now dispatch to hardware */

The layout of the deadline i/o scheduler is roughly:

        [1] [2]
         | |
         | |
         | |
         ====[3]====
              |
              |
              |
              |
             [4]

where [1] is the regular ascendingly sorted pending list of requests,
[2] is a fifo list (well really two lists, one for reads and one for
writes) of pending requests which each have an expire time assigned, [3]
is the elv_next_request() worker, and [4] is the dispatch queue
(q->queue_head again). When a request enters the i/o scheduler, it is
sorted into the [1] list, assigned an expire time, and sorted into the
fifo list [2] (the fifo list is really two lists, one for reads and one
for writes).

[1] is the main list where we serve requests from. If a request deadline
gets exceeded, we move a number of entries (known as the 'fifo_batch'
count) from the sorted list starting from the expired entry onto the
dispatch queue. This makes sure that we at least attempt to start an
expired request immediately, but don't completely fall back to FCFS i/o
scheduling (well set fifo_batch == 1, and you will get FCFS with an
appropriately low expire time).

So the tunables in this i/o scheduler are:

o READ fifo_expire. Attemt to start a READ before this time. Set to 1
  second in the patch.

o WRITE fifo_expire. Ditto for writes. Set to 8 seconds in the patch.

o fifo_batch. See above explanation. Set to 8 in the patch.

I was too lazy to make them run time changable in this version, so you
have to edit deadline-iosched.c to set them.

This version also includes a request merge hash. On back merging a bio
into an existing request, we can potentially save a queue scan by just
finding the rq in the hash. This is similar in idea to the bio hash that
was in the first 2.5.1-pre patches, only works on a request instead (ie
isn't broken like the bio version :-). I resurrected this idea when
Bartlomiej brought it up last week. I only added a back merge hash,
since previous expirements showed that 90% or more merge hits are back
merge hits. We will still check front merges, but only doing the scan
itself. I should do new statistics on this :-)

Finally, I've done some testing on it. No testing on whether this really
works well in real life (that's what I want testers to do), and no
testing on benchmark performance changes etc. What I have done is
beat-up testing, making sure it works without corrupting your data. I'm
fairly confident that does. Most testing was on SCSI (naturally),
however IDE has also been tested briefly.

On to the patches (I'm loosing track of this email). Both are attached.

elv-queue_head-misc-1
        elevator/block misc changes mentioned. This has gone to Linus
        for 2.5.28 inclusion

deadline-iosched-5
        the deadline i/o scheduler

-- 
Jens Axboe



- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Jul 30 2002 - 14:00:23 EST