Re: [PATCH 2/2] exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting

From: Eric W. Biederman
Date: Tue Nov 25 2014 - 13:44:36 EST


Oleg Nesterov <oleg@xxxxxxxxxx> writes:

> On 11/25, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@xxxxxxxxxx> writes:
>>
>> > On 11/24, Eric W. Biederman wrote:
>> >>
>> >> However at the moment my I can't figure out if it is safe to move
>> >> get_pid_ns elow hlist_add_head_rcu. Because once we are on the rcu list
>> >> the pid is findable, and being publicly visible with a bad refcount could cause
>> >> problems.
>> >
>> > The caller has a reference, this ns can't go away. Obviously, otherwise
>> > get_pid_ns(ns) is not safe.
>> >
>> > We need this get_pid_ns() to balance put_pid()->put_pid_ns() which obviously
>> > won't be called until we return this pid, otherwise everything is wrong.
>> >
>> > So I think this should be safe?
>>
>> My concern is exposing a half initialized struct pid to the world via an
>> rcu data structure. In particular could one of the rcu users get into
>> trouble because we haven't called get_pid_ns yet? That is unclear to me.
>
> They can't. This pid was fully initialized, in particular
> pid->numbers[pid->level].ns == ns has a reference.
>
> Just it is not ready for put_pid() which will be called by the "owner" of
> this pid, the caller or the new child. So in this sense it doesn't matter
> when we call get_pid_ns(), just we need to do this before return.

Or by someone calling find_get_pid() ... put_pid().

Now the reference count should not hit zero in that case but I hate to
think of that case separately.

>> That is one of those weird nasty races I would rather not have to
>> consider and moving the get_pid_ns after hlist_add requires that we
>> think about it.
>>
>> To fix the error handling and avoid thinking about the races we have two
>> choices:
>> - In the error path that is currently called out_unlock we can drop the
>> extra references.
>> - Immediately after we perform the test that on error jumps to out_unlock
>> we call get_pid_ns.
>>
>> My preference would be the first, as it is a trivially correct one line
>> change.
>>
>> Aka I think this is the obviously correct trivial fix.
>>
>> out_unlock:
>> spin_unlock_irq(&pidmap_lock);
>> + put_pid_ns(ns);
>
> Sure, initially I was going to do this. But this is sub-optimal imo, I mainly
> mean less clear (imho).
>
> But again, I won't argue. I'll send V2 once we finish the discussion
> about 2/2.

At this point, and especially since we need to Cc stable and get this
fix backported to who knows how many kernel releases having something
that is trivial to validate is correct is important.

If you prefer to call get_pid_ns() immedately after:

if (!(ns->nr_hashed & PIDNS_HASH_ADDING))
goto out_unlock;

That would be fine with me as well.

Anything else to too clever for my brain to verify is correct today.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/