Re: [Bug 3317] New: Kernel oops in aio_complete while running AIOapplication

From: badari
Date: Sat Sep 04 2004 - 12:33:46 EST


Daniel,

aio_complete() gets called only when we are done with this dio.
Other calls to finished_one_bio() should be fine. dio->result
should have the return value we want to send back. The fix
I made is to call aio_complete() only if we have something to
report back.

One problem is, dio->result gets updated for IO errors bur
doesn't get updated for errors from get_user_pages(). Things
should be fine, but I am not really comfortable retruning half
errors thro aio_complete() and other half thro return value
of do_direct_IO(). I guess its okay, since some of the IO errors
can happen only after we submit the bio.

Thanks,
Badari

Daniel McNeil wrote:

On Fri, 2004-09-03 at 08:52, Badari Pulavarty wrote:


On Tue, 2004-08-31 at 08:18, Andrew Morton wrote:


Begin forwarded message:

Date: Tue, 31 Aug 2004 06:15:18 -0700
From: bugme-daemon@xxxxxxxx
To: bugme-new@xxxxxxxxxxxxxx
Subject: [Bugme-new] [Bug 3317] New: Kernel oops in aio_complete while running AIO application


http://bugme.osdl.org/show_bug.cgi?id=3317



Hi Andrew,

I debugged this some more. Here is whats happening:

The test program used program text address as buffer to do the READ to.
DIO get_user_pages() returned EFAULT. We called finished_one_bio()
as part of dropping the ref. to dio. It called aio_complete().
do_direct_IO() returned EFAULT to the caller. aio_run_iocb() expects
to see EIOCBQUEUED/RETRY, otherwise it calls aio_complete() with the
"ret" value. This is where the second aio_complete() is coming from.
So we cleanup "req" and on the next de-ref we get OOPS.

The problem here is, finished_one_bio() shouldn't call aio_complete()
since no work has been done. I have a fix for this - can you verify this
? I am not really comfortable with this "tweaking". (I am not really
sure about IO errors like EIO etc. - if they can lead to calling
aio_complete() twice)


Fix is to call aio_complete() ONLY if there is something to report.
Note the we don't update dio->result with any error codes from
get_user_pages(), they just passed as "ret" value from do_direct_IO().

Thanks,
Badari



Badari,

This does fix the problem when running on my system (ext3).

One question, finished_one_bio() is called in 3 places,
are you sure the other places won't be harmed by this
change?

I'm also looking over the code and will let you know if
I see any problems.

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@xxxxxxxxxx For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@xxxxxxxxx";>aart@xxxxxxxxx</a>




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/