Re: [patch] fs: aio fix rcu lookup

From: Nick Piggin
Date: Wed Jan 19 2011 - 11:03:41 EST


On Thu, Jan 20, 2011 at 12:21 AM, Jan Kara <jack@xxxxxxx> wrote:
> On Wed 19-01-11 11:20:45, Nick Piggin wrote:
>> On Wed, Jan 19, 2011 at 10:52 AM, Jan Kara <jack@xxxxxxx> wrote:
>> > On Wed 19-01-11 09:17:23, Nick Piggin wrote:
>> >> On Wed, Jan 19, 2011 at 6:01 AM, Jan Kara <jack@xxxxxxx> wrote:
>> >> > On Tue 18-01-11 10:24:24, Nick Piggin wrote:
>> >> >> On Tue, Jan 18, 2011 at 6:07 AM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
>> >> >> > Nick Piggin <npiggin@xxxxxxxxx> writes:
>> >> >> >> Do you agree with the theoretical problem? I didn't try to
>> >> >> >> write a racer to break it yet. Inserting a delay before the
>> >> >> >> get_ioctx might do the trick.
>> >> >> >
>> >> >> > I'm not convinced, no.  The last reference to the kioctx is always the
>> >> >> > process, released in the exit_aio path, or via sys_io_destroy.  In both
>> >> >> > cases, we cancel all aios, then wait for them all to complete before
>> >> >> > dropping the final reference to the context.
>> >> >>
>> >> >> That wouldn't appear to prevent a concurrent thread from doing an
>> >> >> io operation that requires ioctx lookup, and taking the last reference
>> >> >> after the io_cancel thread drops the ref.
>> >> >>
>> >> >> > So, while I agree that what you wrote is better, I remain unconvinced of
>> >> >> > it solving a real-world problem.  Feel free to push it in as a cleanup,
>> >> >> > though.
>> >> >>
>> >> >> Well I think it has to be technically correct first. If there is indeed a
>> >> >> guaranteed ref somehow, it just needs a comment.
>> >> >  Hmm, the code in io_destroy() indeed looks fishy. We delete the ioctx
>> >> > from the hash table and set ioctx->dead which is supposed to stop
>> >> > lookup_ioctx() from finding it (see the !ctx->dead check in
>> >> > lookup_ioctx()). There's even a comment in io_destroy() saying:
>> >> >        /*
>> >> >         * Wake up any waiters.  The setting of ctx->dead must be seen
>> >> >         * by other CPUs at this point.  Right now, we rely on the
>> >> >         * locking done by the above calls to ensure this consistency.
>> >> >         */
>> >> > But since lookup_ioctx() is called without any lock or barrier nothing
>> >> > really seems to prevent the list traversal and ioctx->dead test to happen
>> >> > before io_destroy() and get_ioctx() after io_destroy().
>> >> >
>> >> > But wouldn't the right fix be to call synchronize_rcu() in io_destroy()?
>> >> > Because with your fix we could still return 'dead' ioctx and I don't think
>> >> > we are supposed to do that...
>> >>
>> >> With my fix we won't oops, I was a bit concerned about ->dead,
>> >> yes but I don't know what semantics it is attempted to have there.
>> >  But wouldn't it do something bad if the memory gets reallocated for
>> > something else and set to non-zero? E.g. memory corruption?
>>
>> I don't see how it would with my patch.
>  Ah, you are right. Since the whole structure gets freed only after the RCU
> period expires, we are guaranteed to see 0 in ctx->users until we drop
> rcu_read_lock. Maybe a comment like the above would be useful at the place
> where you use try_get_ioctx() but your patch works.
>
>> >> synchronize_rcu() in io_destroy()  does not prevent it from returning
>> >> as soon as lookup_ioctx drops the rcu_read_lock().
>> >  Yes, exactly. So references obtained before synchronize_rcu() would be
>> > completely fine and valid references and there would be no references after
>> > synchronize_rcu() because they'd see 'dead' set. But looking at the code
>> > again it still would not be enough because we could still race with
>> > io_submit_one() adding new IO to the ioctx which will be freed just after
>> > put_ioctx() in do_io_submit().
>> >
>> > The patch below implements what I have in mind - it should be probably
>> > split into two but I'd like to hear comments on that before doing these
>> > cosmetic touches ;)
>>
>> Well this seems to solve it too, but it is 2 problems here. It is changing
>> the semantics of io_destroy which requires the big synchronize_rcu()
>> hammer.
>>
>> But I don't believe that's necessarily desirable, or required. In fact it is
>> explicitly not reuired because it only says that it _may_ cancel outstanding
>> requests.
>  Well, we are not required to cancel all the outstanding AIO because of the
> API requirement, that's granted. But we must do it because of the way how
> the code is written. Outstanding IO requests reference ioctx but they are
> not counted in ctx->users but in ctx->reqs_active. So the code relies on
> the fact that the reference held by the hash table protects ctx from being
> freed and io_destroy() waits for requests before dropping the last
> reference to ctx. But there's the second race I describe making it possible
> for new IO to be created after io_destroy() has waited for all IO to
> finish...

Yes there is that race too I agree. I just didn't follow through the code far
enough to see it was a problem -- I thought it was by design.

I'd like to solve it without synchronize_rcu() though.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/