Re: [PATCH v3] netlink: Fix autobind race condition that leads to zero port ID

From: Tejun Heo
Date: Fri Sep 18 2015 - 09:37:30 EST


Hello, Herbert.

On Fri, Sep 18, 2015 at 02:36:10PM +0800, Herbert Xu wrote:
> On Thu, Sep 17, 2015 at 07:30:34AM -0400, Tejun Heo wrote:
> >
> > Maybe add that this led to a deadlock and add a Link tag to this
> > thread?
>
> I'll add a note about the deadlock but I don't like Link tags
> because websites die and you can always just google the patch
> subject.

That's why we use http://lkml.kernel.org/r/MSG_ID links.

> > > + nlk_sk(sk)->bound = !!portid;
> >
> > !! isn't necessasry and this creates ordering between two stores.
>
> !! was necessary because we're going from a u32 to a bool.

bool casting actually collapses the source value to a boolean value.
No need for casting regardless of data type.

> @@ -1076,17 +1076,19 @@ static int netlink_insert(struct sock *sk, u32 portid)
> unlikely(atomic_read(&table->hash.nelems) >= UINT_MAX))
> goto err;
>
> - nlk_sk(sk)->portid = portid;
> + nlk_sk(sk)->rhash_portid = portid;
> sock_hold(sk);
>
> err = __netlink_insert(table, sk);
> if (err) {
> if (err == -EEXIST)
> err = -EADDRINUSE;
> - nlk_sk(sk)->portid = 0;
> sock_put(sk);
> + goto err;
> }
>
> + nlk_sk(sk)->portid = portid;

So, this doesn't necessarily make the ordering problem go away. The
hash lookup would be fine but imagine a code path like the following.

rcu_read_lock();
sock = rhash lookup(some port number);
do some operation which may use sock->portid;
rcu_read_unlock();

Now, that some operation may see 0 as the port number. I don't think
you can avoid doing some type of memory barrier operations if you
wanna gate autobind w/o grabbing locks.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/