Re: [PATCH 2/2] exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting

From: Oleg Nesterov
Date: Tue Nov 25 2014 - 13:15:34 EST


On 11/25, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@xxxxxxxxxx> writes:
>
> > On 11/24, Eric W. Biederman wrote:
> >>
> >> However at the moment my I can't figure out if it is safe to move
> >> get_pid_ns elow hlist_add_head_rcu. Because once we are on the rcu list
> >> the pid is findable, and being publicly visible with a bad refcount could cause
> >> problems.
> >
> > The caller has a reference, this ns can't go away. Obviously, otherwise
> > get_pid_ns(ns) is not safe.
> >
> > We need this get_pid_ns() to balance put_pid()->put_pid_ns() which obviously
> > won't be called until we return this pid, otherwise everything is wrong.
> >
> > So I think this should be safe?
>
> My concern is exposing a half initialized struct pid to the world via an
> rcu data structure. In particular could one of the rcu users get into
> trouble because we haven't called get_pid_ns yet? That is unclear to me.

They can't. This pid was fully initialized, in particular
pid->numbers[pid->level].ns == ns has a reference.

Just it is not ready for put_pid() which will be called by the "owner" of
this pid, the caller or the new child. So in this sense it doesn't matter
when we call get_pid_ns(), just we need to do this before return.

> That is one of those weird nasty races I would rather not have to
> consider and moving the get_pid_ns after hlist_add requires that we
> think about it.
>
> To fix the error handling and avoid thinking about the races we have two
> choices:
> - In the error path that is currently called out_unlock we can drop the
> extra references.
> - Immediately after we perform the test that on error jumps to out_unlock
> we call get_pid_ns.
>
> My preference would be the first, as it is a trivially correct one line
> change.
>
> Aka I think this is the obviously correct trivial fix.
>
> out_unlock:
> spin_unlock_irq(&pidmap_lock);
> + put_pid_ns(ns);

Sure, initially I was going to do this. But this is sub-optimal imo, I mainly
mean less clear (imho).

But again, I won't argue. I'll send V2 once we finish the discussion about 2/2.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/