Re: [2/2] fs: Fix race between io_destroy() and io_submit() in AIO

From: Jan Kara
Date: Tue Feb 15 2011 - 14:15:22 EST


Hi,

On Tue 15-02-11 12:50:32, Milton Miller wrote:
> On Tue, 15 Feb 2011 about 11:16:16 -0600, Jan Kara wrote:
> > On Tue 15-02-11 12:59:24, Milton Miller wrote:
> > > > A race can occur when io_submit() races with io_destroy():
> > > >
> > > > CPU1 CPU2
> > > > io_submit()
> > > > do_io_submit()
> > > > ...
> > > > ctx = lookup_ioctx(ctx_id);
> > > > io_destroy()
> > > > Now do_io_submit() holds the last reference to ctx.
> > > > ...
> > > > queue new AIO
> > > > put_ioctx(ctx) - frees ctx with active AIOs
> > > >
> > > > We solve this issue by checking whether ctx is being destroyed
> > > > in AIO submission path after adding new AIO to ctx. Then we
> > > > are guaranteed that either io_destroy() waits for new AIO or
> > > > we see that ctx is being destroyed and bail out.
> > > >
> > > > Reviewed-by: Jeff Moyer <jmoyer@xxxxxxxxxx>
> > > > Signed-off-by: Jan Kara <jack@xxxxxxx>
> > > > CC: Nick Piggin <npiggin@xxxxxxxxx>
> > > >
> > > > ---
> > > > fs/aio.c | 15 +++++++++++++++
> > > > 1 files changed, 15 insertions(+), 0 deletions(-)
> > > >
> > > > diff --git a/fs/aio.c b/fs/aio.c
> > > > index b4dd668..0244c04 100644
> > > > --- a/fs/aio.c
> > > > +++ b/fs/aio.c
> > > > @@ -1642,6 +1642,21 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
> > > > goto out_put_req;
> > > >
> > > > spin_lock_irq(&ctx->ctx_lock);
> > > > + /*
> > > > + * We could have raced with io_destroy() and are currently holding a
> > > > + * reference to ctx which should be destroyed. We cannot submit IO
> > > > + * since ctx gets freed as soon as io_submit() puts its reference.
> > > > + * The check here is reliable since io_destroy() sets ctx->dead before
> > > > + * waiting for outstanding IO. Thus if we don't see ctx->dead set here,
> > > > + * io_destroy() waits for our IO to finish.
> > > > + * The check is inside ctx->ctx_lock to avoid extra memory barrier
> > > > + * in this fast path...
> > > > + */
> > >
> > > When reading this comment, and with all of the recient discussions I
> > > had with Paul in the smp ipi thread (especially with resepect to third
> > > party writes), I looked to see that the spinlock was paired with the
> > > spinlock to set dead in io_destroy. It is not. It took me some time
> > > to find that the paired lock is actually in wait_for_all_aios. Also,
> > > dead is also set in aio_cancel_all which is under the same spinlock.
> > >
> > > Please update this lack of memory barrier comment to reflect the locking.
>
> This locking description is wrong:
>
> > Hum, sorry but I don't understand. The above message wants to say that
> > io_destroy() does
> > ctx->dead = 1
> > barrier (implied by a spin_unlock)
>
> no spin_unlock only does a release barrier.
>
> > wait for reqs_active to get to 0
>
> This read can move up into the spinlocked region (up to the lock acquire).
>
> >
> > while io_submit() does
> > increment reqs_active
> > barrier (implied by a spin_lock - on a different lock but that does not
> > matter as we only need the barrier semantics)
>
> No only an acquire barrier, old writes can move into the spinlock region
>
> > check ctx->dead
>
> the increment can move down past this check to the unlock here.
Ah OK, you're right. I was typing too fast and thinking too slow ;).

> > So if io_submit() gets past ctx->dead check, io_destroy() will certainly
> > wait for our reference in reqs_active to be released.
> >
> > I don't see any lock pairing needed here... But maybe I miss something.
> >
> > Honza
>
> spin lock and unlock are only half barriers as described in
> Documentation/memory-barriers.txt
>
>
> Now, as I said, the code is ok because the active count is read and
> written under ctx->ctx_lock, and aio_cancel_all sets dead under
> that lock.
>
> But the comment needs to reflect that and not just the the code is
> under in some random spin_lock region instead of a memory barrier,
> which is not sufficient. Bad lock descriptions leads to making bad
> code in the future, either through copying it to another context or
> though future work removing the additional constraints not mentioned.
>
> So please, comment which locks are being used here, as what
> you described is not enough.
Yep, I'll improve the comment. Thanks for explanation.

Honza
--
Jan Kara <jack@xxxxxxx>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/