Re: [PATCH v4] net: rose: fix null-ptr-deref caused by rose_kill_by_neigh

From: duoming
Date: Sat Jul 02 2022 - 20:43:38 EST


Hello,

On Sat, 2 Jul 2022 12:01:08 -0700 Jakub Kicinski wrote:

> On Sat, 2 Jul 2022 15:23:57 +0800 (GMT+08:00) duoming@xxxxxxxxxx wrote:
> > > On Wed, 29 Jun 2022 18:49:41 +0800 Duoming Zhou wrote:
> > > > When the link layer connection is broken, the rose->neighbour is
> > > > set to null. But rose->neighbour could be used by rose_connection()
> > > > and rose_release() later, because there is no synchronization among
> > > > them. As a result, the null-ptr-deref bugs will happen.
> > > >
> > > > One of the null-ptr-deref bugs is shown below:
> > > >
> > > > (thread 1) | (thread 2)
> > > > | rose_connect
> > > > rose_kill_by_neigh | lock_sock(sk)
> > > > spin_lock_bh(&rose_list_lock) | if (!rose->neighbour)
> > > > rose->neighbour = NULL;//(1) |
> > > > | rose->neighbour->use++;//(2)
> > >
> > > > if (rose->neighbour == neigh) {
> > >
> > > Why is it okay to perform this comparison without the socket lock,
> > > if we need a socket lock to clear it? Looks like rose_kill_by_neigh()
> > > is not guaranteed to clear all the uses of a neighbor.
> >
> > I am sorry, the comparision should also be protected with socket lock.
> > The rose_kill_by_neigh() only clear the neighbor that is passed as
> > parameter of rose_kill_by_neigh().
>
> Don't think that's possible, you'd have to drop the neigh lock every
> time.

The neighbour is cleared in two situations.

(1) When the rose device is down, the rose_link_device_down() traverses
the rose_neigh_list and uses the rose_kill_by_neigh() to clear the
neighbors of the device.

void rose_link_device_down(struct net_device *dev)
{
struct rose_neigh *rose_neigh;

for (rose_neigh = rose_neigh_list; rose_neigh != NULL; rose_neigh = rose_neigh->next) {
if (rose_neigh->dev == dev) {
rose_del_route_by_neigh(rose_neigh);
rose_kill_by_neigh(rose_neigh);
}
}
}

https://elixir.bootlin.com/linux/v5.19-rc4/source/net/rose/rose_route.c#L839

(2) When the level 2 link has timed out, the rose_link_failed() calls rose_kill_by_neigh()
to clear the rose_neigh.

https://elixir.bootlin.com/linux/v5.19-rc4/source/net/rose/rose_route.c#L813

> > > > + sock_hold(s);
> > > > + spin_unlock_bh(&rose_list_lock);
> > > > + lock_sock(s);
> > > > rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0);
> > > > rose->neighbour->use--;
> > >
> > > What protects the use counter?
> >
> > The use counter is protected by socket lock.
>
> Which one, the neigh object can be shared by multiple sockets, no?

The sk_for_each() traverses the rose_list and uses the lock of the socket that is extracted
from the rose_list to protect the use counter.

diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index bf2d986a6bc..6d5088b030a 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -165,14 +165,26 @@ void rose_kill_by_neigh(struct rose_neigh *neigh)
struct sock *s;

spin_lock_bh(&rose_list_lock);
+again:
sk_for_each(s, &rose_list) {
struct rose_sock *rose = rose_sk(s);

+ sock_hold(s);
+ spin_unlock_bh(&rose_list_lock);
+ lock_sock(s);
if (rose->neighbour == neigh) {
rose_disconnect(s, ENETUNREACH, ROSE_OUT_OF_ORDER, 0);
rose->neighbour->use--;
rose->neighbour = NULL;
+ release_sock(s);
+ sock_put(s);
+ spin_lock_bh(&rose_list_lock);
+ goto again;
}
+ release_sock(s);
+ sock_put(s);
+ spin_lock_bh(&rose_list_lock);
+ goto again;
}
spin_unlock_bh(&rose_list_lock);
}

Best regards,
Duoming Zhou