Re: [syzbot] KASAN: use-after-free Write in sco_sock_timeout

From: Desmond Cheong Zhi Xi
Date: Sun Aug 29 2021 - 14:34:23 EST


On 29/8/21 10:53 pm, Desmond Cheong Zhi Xi wrote:
On 29/8/21 4:29 pm, Hillf Danton wrote:
On Fri, 27 Aug 2021 15:58:34 +0800 Desmond Cheong Zhi Xi wrote:
On 27/8/21 9:19 am, Hillf Danton wrote:
On Thu, 26 Aug 2021 09:29:24 -0700
syzbot found the following issue on:

HEAD commit:    e3f30ab28ac8 Merge branch 'pktgen-samples-next'
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=13249c96300000
kernel config: https://syzkaller.appspot.com/x/.config?x=ef482942966bf763
dashboard link: https://syzkaller.appspot.com/bug?extid=2bef95d3ab4daa10155b
compiler:       gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.1
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16a29ea9300000

The issue was bisected to:

commit e1dee2c1de2b4dd00eb44004a4bda6326ed07b59
Author: Desmond Cheong Zhi Xi <desmondcheongzx@xxxxxxxxx>
Date:   Tue Aug 10 04:14:10 2021 +0000

      Bluetooth: fix repeated calls to sco_sock_kill

To fix the uaf, grab another hold to sock to make the timeout work safe.

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git e3f30ab28ac8

--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -190,15 +190,14 @@ static void sco_conn_del(struct hci_conn
      sco_conn_unlock(conn);
      if (sk) {
-        sock_hold(sk);
          lock_sock(sk);
          sco_sock_clear_timer(sk);
          sco_chan_del(sk, err);
          release_sock(sk);
-        sock_put(sk);
          /* Ensure no more work items will run before freeing conn. */
          cancel_delayed_work_sync(&conn->timeout_work);
+        sock_put(sk);

Hi Hillf,

Saw that this passed the reproducer. But on closer inspection, I think what's happening is that sco_conn_del is never run.

So the extra sock_hold prevents a UAF, but that's because now the reference count never goes to 0. In my opinion, something closer to your previous proposal (+ also addressing other calls to __sco_sock_close) where we call cancel_delayed_work_sync after the channel is deleted would address the root cause better.

Just my two cents.


Ok I went back to make a more thorough audit. Even without calling
cancel_delayed_work_sync, sco_sock_timeout should not cause a UAF.

I believe the real issue is that we can allocate a connection twice in
sco_connect. This means that the first connection gets lost and we're
unable to clean it up properly.

Thoughts on this?

#syz test: git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git e3f30ab28ac8

--- a/net/bluetooth/sco.c
+++ b/net/bluetooth/sco.c
@@ -578,9 +578,6 @@ static int sco_sock_connect(struct socket *sock, struct sockaddr *addr, int alen
addr->sa_family != AF_BLUETOOTH)
return -EINVAL;
- if (sk->sk_state != BT_OPEN && sk->sk_state != BT_BOUND)
- return -EBADFD;
-
if (sk->sk_type != SOCK_SEQPACKET)
return -EINVAL;
@@ -591,6 +588,13 @@ static int sco_sock_connect(struct socket *sock, struct sockaddr *addr, int alen
lock_sock(sk);
+ if (sk->sk_state != BT_OPEN && sk->sk_state != BT_BOUND) {
+ hci_dev_unlock(hdev);
+ hci_dev_put(hdev);
+ err = -EBADFD;
+ goto done;
+ }
+
/* Set destination address and psm */
bacpy(&sco_pi(sk)->dst, &sa->sco_bdaddr);