Re: [patch] bug fix in dio handling write error - v2

From: Suparna Bhattacharya
Date: Sun Mar 19 2006 - 06:53:18 EST


On Sun, Mar 19, 2006 at 01:27:33AM -0800, Chen, Kenneth W wrote:
> Referring to original posting:
> http://marc.theaimsgroup.com/?t=113752710100001&r=1&w=2
>
> Suparna pointed out that this fix has a potential race window. I think
> the race condition also exists on the READ side currently. The fundamental
> problem is that dio->result is overloaded with dual use: an indicator of
> fall back path for partial dio write, and an error indicator used in the
> I/O completion path. In the event of device error, the setting of -EIO
> to dio->result clashes with value used to track partial write that activates
> the fall back path.
>
> One way to fix the race issue is to pull all the code that uses dio->result
> up before I/O is submitted and then never look at dio->result once IO is
> submitted. Or alternatively another independent variable can be introduce
> to track fall back state, note that to set the tracking logic, kernel still
> have to look at dio->result before I/O is submitted.
>
> The following patch implements the first option. I would appreciate some
> reviews. Thanks.
>
>
>
> [patch] bug fix in dio handling write error
>
> There is a bug in direct-io on propagating write error up to the
> higher I/O layer. When performing an async ODIRECT write to a
> block device, if a device error occurred (like media error or disk
> is pulled), the error code is only propagated from device driver
> to the DIO layer. The error code stops at finished_one_bio(). The
> aysnc write, however, is supposedly have a corresponding AIO event
> with appropriate return code (in this case -EIO). Application
> which waits on the async write event, will hang forever since such
> AIO event is lost forever (if such app did not use the timeout
> option in io_getevents call. Regardless, an AIO event is lost).
>
> The discovery of above bug leads to another discovery of potential
> race window with dio->result. The fundamental problem is that
> dio->result is overloaded with dual use: an indicator of fall back
> path for partial dio write, and an error indicator used in the I/O
> completion path. In the event of device error, the setting of -EIO
> to dio->result clashes with value used to track partial write that
> activates the fall back path.
>
> Propose to fix the race issue by pulling all the code that uses
> dio->result up before I/O is submitted and then never look at
> dio->result once IO is submitted.
>
> Signed-off-by: Ken Chen <kenneth.w.chen@xxxxxxxxx>
>
> --- ./fs/direct-io.c.orig 2006-01-02 19:21:10.000000000 -0800
> +++ ./fs/direct-io.c 2006-03-19 01:57:49.797391064 -0800
> @@ -253,8 +253,7 @@ static void finished_one_bio(struct dio
> dio_complete(dio, offset, transferred);
>
> /* Complete AIO later if falling back to buffered i/o */
> - if (dio->result == dio->size ||
> - ((dio->rw == READ) && dio->result)) {
> + if (!dio->waiter) {
> aio_complete(dio->iocb, transferred, 0);
> kfree(dio);
> return;
> @@ -939,7 +938,7 @@ direct_io_worker(int rw, struct kiocb *i
> struct dio *dio)
> {
> unsigned long user_addr;
> - int seg;
> + int seg, should_wait = 0;
> ssize_t ret = 0;
> ssize_t ret2;
> size_t bytes;
> @@ -1031,6 +1030,16 @@ direct_io_worker(int rw, struct kiocb *i
> }
> } /* end iovec loop */
>
> + /*
> + * We'll have to wait for a partial write to drain before falling back
> + * to buffered. We check this before submitting io so that completion
> + * doesn't have a chance to overwrite dio->result with -EIO.
> + */
> + if (dio->result < dio->size && dio->is_async && rw == WRITE) {
> + dio->waiter = current;
> + should_wait = 1;
> + }
> +

Isn't there a possibility that part of the IO for the overall request
may already have been submitted at this point ? (i.e. within
do_direct_IO->submit_page_section ->dio_send_cur_page->dio_bio_submit)
This is what I was referring to in my earlier response to Zach's patch.

> if (ret == -ENOTBLK && rw == WRITE) {
> /*
> * The remaining part of the request will be
> @@ -1051,6 +1060,7 @@ direct_io_worker(int rw, struct kiocb *i
> page_cache_release(dio->cur_page);
> dio->cur_page = NULL;
> }
> + ret2 = dio->result;
> if (dio->bio)
> dio_bio_submit(dio);
>
> @@ -1073,14 +1083,8 @@ direct_io_worker(int rw, struct kiocb *i
> * reflect the number of to-be-processed BIOs.
> */
> if (dio->is_async) {
> - int should_wait = 0;
> -
> - if (dio->result < dio->size && rw == WRITE) {
> - dio->waiter = current;
> - should_wait = 1;
> - }
> if (ret == 0)
> - ret = dio->result;
> + ret = ret2;
> finished_one_bio(dio); /* This can free the dio */
> blk_run_address_space(inode->i_mapping);
> if (should_wait) {
>
>
>

Regards
Suparna

--
Suparna Bhattacharya (suparna@xxxxxxxxxx)
Linux Technology Center
IBM Software Lab, India

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/