Re: [PATCH 7/9] aio: add delayed cancel support

From: Al Viro
Date: Thu Mar 22 2018 - 12:34:07 EST


On Wed, Mar 21, 2018 at 08:32:30AM +0100, Christoph Hellwig wrote:
> The upcoming aio poll support would like to be able to complete the
> iocb inline from the cancellation context, but that would cause
> a lock order reversal. Add support for optionally moving the cancelation
> outside the context lock to avoid this reversal.

Ouch... Seeing that you've just taken out cmpxchg loop out of kiocb_cancel()
with "serialized on ->ctx_lock" for explanation of safety... Let me check
the aio_poll side of it; this commit might be better off in the poll series,
*if* it is actually correct.

What's to prevent double completions there? Suppose we have iocb sitting on
the wait queue; cancellation callback set, so's "delayed cancel" flag.

Now, somebody tries to cancel the fucker on CPU1. With ctx->lock held the
sucker is found on the list and, just as we mark it "cancelled", driver sends
a wakeup, executing (on CPU2) aio_poll_wake(), calling aio_complete_poll()
(without ctx->lock, so no exclusion with io_cancel(2) on CPU1), which checks
AIO_IOCB_CANCELLED and does not notice the flag being set on CPU1, then
proceeds to __aio_complete_poll() and fput() in there.

In the meanwhile, CPU1 has taken the sucker off the list, dropped the
lock and called kiocb_cancel() on it. Now we get aio_poll_cancel()
and __aio_complete_poll() on CPU1, with *another* fput().

What am I missing here that would prevent such a race?