Re: [PATCH][RFC] AIO: always reinitialize iocb->ki_run_list at the end of aio_run_iocb()

From: Jeff Moyer
Date: Tue Jun 01 2010 - 17:15:35 EST


Sergey Temerkhanov <temerkhanov@xxxxxxxxx> writes:

> On Wednesday 26 May 2010 23:38:35 Jeff Moyer wrote:
> ...
>> I can vaguely recall discussion surrounding the reference counting of
>> cancel methods, but I have no idea what the actual contents of those
>> discussions were. Sorry, my memory has failed me. Either Zach or
>> Suparna might remember better.
>>
>> Sergey, the cancellation path, unfortunately, is not well exercised as
>> I'm sure you are aware. As you pointed out, the only implementation of
>> a cancel method is the usb gadget interface. Now, given that they've
>> worked fine with the extra put in their cancel method, I'm not sure why
>> you can't do the same.
> Well, in fact, they have only one aio_put_req() in their cancel method. This
> is the code from 2.6.34:

I was referring to the aio_put_req done deeper in the call chain by the
completion methods for the usb gadgetfs request.

> And adding extra aio_put_req() to the cancel method will not fix failing
> kick_iocb() which is another problem and this patch is supposed to address it.

I guess I'm confused. You wrote the following:

> I've written the driver code which implements a zero-copy DMA char device. It
> has aio_read() and aio_write() methods which return -EIOCBQUEUED after the
> successful preparation of the buffers described by kiocb and posting it to the
> descriptor chain. When the descriptors are processed, the DMA engine raises
> the interrupt and the cleanup work is done in the handler, including
> aio_complete() for the completed kiocbs.
>
> This works fine, however, there is a problem with canceling the queued
> requests, espesially on io_destroy() syscall. Since there is no simple way to
> remove single kiocb from the descriptor chain, I'm removing all of them from
> the queue using aio_complete() or aio_put_req() in the ki_cancel() callback
> routine of my driver. The main problem is the reference counting in
> aio_cancel_all():
>
> if (cancel) {
> iocb->ki_users++;
> spin_unlock_irq(&ctx->ctx_lock);
> cancel(iocb, &res);
> spin_lock_irq(&ctx->ctx_lock);
> }
>
> Here the iocb->ki_users gets incremented which already has the value 1 at this
> point (after the io_submit_one() completion) and it's never released (). So I
> have to call aio_put_req() twice for the given kiocb (this seems to be the
> hack to me) or I'll end up with the unkillable process stuck in
> wait_for_all_aios() at the io_schedule(). I've posted the patches where I've
> added aio_put_req() but I think it needs more testing.

OK, you tried two aio_put_req() calls, and it worked, but you thought
maybe it wasn't the right approach. So:

> So, I've tried another approach (hack) - requeue the kiocb with
> kick_iocb() before calling aio_put_req() in the ki_cancel() callback
> (that's because aio_run_iocb() takes some special actions for the
> canceled kiocbs). And I've found out that kick_iocb() fails because
> aio_run_iocb() does this:
> iocb->ki_run_list.next = iocb->ki_run_list.prev = NULL;
> and only reinitializes iocb->ki_run_list when iocb->ki_retry() returns
> -EIOCBRETRY but kick_iocb() is exported and looks like intended for usage
> (though not recommended).

You implemented this other hack, and ran into troubles, so you're
modifying the aio core to fix it. Am I wrong in concluding that if you
keep your first solution above, you no longer need this second?

You may also find the following an interesting read:

http://permalink.gmane.org/gmane.linux.kernel.aio.general/2571

Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/