Re: [lockdep] UAF read in print_name().

From: Waiman Long
Date: Sun Jan 02 2022 - 21:35:36 EST


On 1/1/22 13:02, Waiman Long wrote:
On 12/30/21 10:09, Tetsuo Handa wrote:
On 2021/12/29 12:25, Waiman Long wrote:
On 12/28/21 05:49, Tetsuo Handa wrote:
Hello.

I found using linux-next-20211210 that reading /proc/lockdep after lockdep splat
triggers UAF read access. I think this is a side effect of zapping dependency
information when loop driver's WQ is destroyed. You might want to xchg() the pointer
with a dummy struct containing a static string.

difference before lockdep splat and after lockdep splat
----------------------------------------
8635c8636
< ffff88811561cd28 OPS:      26 FD:  122 BD:    1 +.+.: (wq_completion)loop0
---
ffff88811561cd28 OPS:      31 FD: 439 BD:    1 +.+.:  M>^MM-^AM-^HM-^?M-^?
Thanks for reporting.

Yes, listing locking classes by /proc/lockdep is racy as all_lock_classes is accessed
without lock protection. OTOH, we probably can't fix this race as lock hold time will be
too long for this case. Atomically xchg the class name is a possible workaround, but we
also need to add additional checks as the iteration may also be redirected to
free_lock_classes leading to an endless iteration loop.
Thanks for responding. But is this bug really unfixable?
I am not saying that it is unfixable. I am just saying that we cannot guarantee a consistent output of /proc/lockdep as internal data may change in the middle of dumping the output.

Please see the following result.

----------------------------------------
[root@localhost ~]# uname -r
5.16.0-rc4-next-20211210
[root@localhost ~]# grep loop /proc/lockdep
[root@localhost ~]# truncate -s 100m testfile
[root@localhost ~]# losetup -f testfile
[root@localhost ~]# grep loop /proc/lockdep
ffffffffa02b73c8 OPS:      17 FD:   34 BD:    1 +.+.: loop_ctl_mutex
ffff888106fb0528 OPS:     114 FD:  183 BD:    1 +.+.: (wq_completion)loop0
[root@localhost ~]# losetup -D
[root@localhost ~]# grep loop /proc/lockdep
ffffffffa02b73c8 OPS:      17 FD:   34 BD:    1 +.+.: loop_ctl_mutex
ffffffffa02b7328 OPS:       1 FD:    1 BD:    1 +.+.: loop_validate_mutex
[root@localhost ~]# losetup -f testfile
[root@localhost ~]# grep loop /proc/lockdep
ffffffffa02b73c8 OPS:      18 FD:   34 BD:    1 +.+.: loop_ctl_mutex
ffffffffa02b7328 OPS:       1 FD:    1 BD:    1 +.+.: loop_validate_mutex
ffff888106fb1128 OPS:     118 FD:  183 BD:    1 +.+.: (wq_completion)loop0
[root@localhost ~]# losetup -D
[root@localhost ~]# grep loop /proc/lockdep
ffffffffa02b73c8 OPS:      18 FD:   34 BD:    1 +.+.: loop_ctl_mutex
ffffffffa02b7328 OPS:       2 FD:    1 BD:    1 +.+.: loop_validate_mutex
[root@localhost ~]# grep debug_locks /proc/lockdep_stats
  debug_locks:                             1
[root@localhost ~]#
----------------------------------------

We can confirm that the "(wq_completion)loop0" entry disappears when WQ for /dev/loop0 is destroyed.

Then, please see the following reproducer for this lockdep problem. As you can see, there is 10
seconds between lockdep complained and /proc/lockdep is read. 10 seconds should be enough time
for erasing the "(wq_completion)loop0" entry.

Thanks for the reproducer.

Your reproducer can always reproduce the problem. It turns out that it is not really a race condition. The UAF problem is caused by the failure of lockdep to properly zap the "(wq_completion)loop0" lock class. I am going to send out a patch to address this bug.

Cheers,
Longman