[PATCH 00/27] blk_end_request: full I/O completion handler (take 2)

From: Kiyoshi Ueda
Date: Tue Sep 11 2007 - 14:55:37 EST


Hello,

The following is the revised patch-set, fixing the issue pointed
out by Jens and Benny. Changes from the previous post are below:
o Rebased on top of 2.6.23-rc4-mm1
o Don't pass the lock held information (PATCH 01)
o Removed sect2byte() macro (PATCH 02)
o fixed blk_rq_size() and blk_rq_cur_size() for blk_pc_requests (PATCH 02)
o Separated the patch for changes of end_that_request_* user (PATCH 03-26)
o Removed the patch which changes the role of rq->end_io()
from this patch-set because some more discussions are needed
about it.

Summary of each patch are below:
01/27: add new request completion interface, blk_end_request()
02/27: add some functions to get the size of request in bytes
03/27: convert to use blk_end_request() (core parts of block layer)
04/27: convert to use blk_end_request() (arm)
05/27: convert to use blk_end_request() (um)
06/27: convert to use blk_end_request() (DAC960)
07/27: convert to use blk_end_request() (floppy)
08/27: convert to use blk_end_request() (lguest)
09/27: convert to use blk_end_request() (nbd)
10/27: convert to use blk_end_request() (ps3disk)
11/27: convert to use blk_end_request() (sunvdc)
12/27: convert to use blk_end_request() (sx8)
13/27: convert to use blk_end_request() (ub)
14/27: convert to use blk_end_request() (viodasd)
15/27: convert to use blk_end_request() (xen-blkfront)
16/27: convert to use blk_end_request() (viocd)
17/27: convert to use blk_end_request() (i2o_block)
18/27: convert to use blk_end_request() (mmc)
19/27: convert to use blk_end_request() (s390)
20/27: convert to use blk_end_request() (scsi mid-layer)
21/27: convert to use blk_end_request() (ide-scsi)
22/27: convert to use blk_end_request() (xsysace)
23/27: convert to use blk_end_request() (cciss)
24/27: convert to use blk_end_request() (cpqarray)
25/27: convert to use blk_end_request() (normal parts of ide)
26/27: convert to use blk_end_request() (ide-cd, cdrom_newpc_intr())
27/27: remove/unexport no longer needed end_that_request_*

I have tested this patch-set on two machines,
IA64+QLA1280+QLA2200 box and x86_64+SATA+IDE-CDROM box.
Please review and apply.


Below is the explanation about needs and details of this patch-set.

SUMMARY
=======
This set of patches changes request completion interface
between device drivers and block layer to 1 step procedure
from current 2 step procedures using end_that_request_{first/chunk}
and end_that_request_last().

This patch-set prepares for realizing another patch-set which changes
the role of rq->end_io(). It allows request-based multipath to hook
in before completing each chunk of request, check errors for it and
retry it using another path if error is detected.


BACKGROUND
==========
The patch-set which changes the role of rq->end_io() is necessary
to allow device stacking at request level, that is request-based
device-mapper multipath.
Currently, device-mapper is implemented as a stacking block device
at BIO level. OTOH, request-based dm will stack at request level
to allow better multipathing decision.
To allow device stacking at request level, the completion procedure
need to provide a hook for it.
For example, dm-multipath has to check errors and retry with other
paths if necessary before returning the I/O result to upper layer.
struct request has 'end_io' hook currently. But it's called at
the very late stage of completion handling where the I/O result
is already returned to the upper layer.
So we need something here.

The first approach to hook in completion of each chunk of request
was adding a new rq->end_io_first() hook and calling it on the top
of __end_that_request_first().
- http://marc.theaimsgroup.com/?l=linux-scsi&m=115520444515914&w=2
- http://marc.theaimsgroup.com/?l=linux-kernel&m=116656637425880&w=2
However, Jens pointed out that redesigning rq->end_io() as a full
completion handler would be better:

On Thu, 21 Dec 2006 08:49:47 +0100, Jens Axboe <jens.axboe@xxxxxxxxxx> wrote:
> Ok, I see what you are getting at. The current ->end_io() is called when
> the request has fully completed, you want notification for each chunk
> potentially completed.
>
> I think a better design here would be to use ->end_io() as the full
> completion handler, similar to how bio->bi_end_io() works. A request
> originating from __make_request() would set something ala:
.....
> instead of calling the functions manually. That would allow you to get
> notification right at the beginning and do what you need, without adding
> a special hook for this.

I thought his comment was reasonable.
So I modified the patches based on his suggestion.


WHAT IS CHANGED
===============
The change is basically illustlated by the following pseudo code:

[Before]
if (end_that_request_{first/chunk} succeeds) { <-- completes bios
<do something driver specific>
end_that_request_last() <-- calls end_io()
<the request is free from the driver>
} else {
<the request was incomplete, retry for leftover or ignoring>
}

[After]
if (blk_end_request() succeeds) { <-- calls end_io(), completes bios
<the request is free from the driver>
} else {
<the request was incomplete, retry for leftover or ignoring>
}


In detail, request completion procedures are changed like below.

[Before]
o 2 steps completion using end_that_request_{first/chunk}
and end_that_request_last().
o Device drivers have ownership of a request until they
call end_that_request_last().
o rq->end_io() is called at the last stage of
end_that_request_last() for some block layer codes need
specific request handling when completing it.

[After]
o 1 step completion using blk_end_request().
(end_that_request_* are no longer used from device drivers.)
o Device drivers give over ownership of a request
when calling blk_end_request().
If it returns 0, the request is completed.
If it returns 1, the request isn't completed and
the ownership is returned to the device driver again.
o rq->end_io() is called at the top of blk_end_request() to
allow to hook all parts of request completion.
Existing users of rq->end_io() must be changed to do
all parts of request completion.

Thanks,
Kiyoshi Ueda
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/