Re: fs/coda oops bisected to (925b9cd1b8) "locking/rwsem: Make owner store task pointer of last owning reader"

From: Waiman Long
Date: Fri Mar 29 2019 - 13:53:45 EST


On 03/29/2019 12:10 PM, Jan Harkes wrote:
> I was testing Coda on the 5.1-rc2 kernel and noticed that when I run a
> binary out of /coda, the binary would never exit and the system would
> detect a soft lockup. I narrowed it down to a very simple reproducible
> case of running a statically linked executable (busybox) from /coda with
> the cwd outside of Coda, so the only Coda file reference is from the
> executable itself.
>
> I knew I definitely had never seen this problem with the stable kernel
> on Ubuntu xenial (4.4) so I bisected between v4.4 and v5.1-rc2 and ended
> up at
>
> # first bad commit: [925b9cd1b89a94b7124d128c80dfc48f78a63098]
> # locking/rwsem: Make owner store task pointer of last owning reader
>
> When I revert this particular commit on 5.1-rc2, I am not able to
> reproduce the problem anymore.
>
> The puzzling thing to me is that a lot of that particular patch touches
> codepaths that are not even enabled in the kernels that I run, because I
> do not have CONFIG_RWSEM_DEBUG enabled.
>
> $ grep RWSEM .config
> CONFIG_RWSEM_XCHGADD_ALGORITHM=y
> CONFIG_RWSEM_SPIN_ON_OWNER=y
> # CONFIG_DEBUG_RWSEMS is not set
>
> And this patch is for rwsem, while my soft lockup is on a spinlock.
> So either I have a race in fs/coda that got somehow uncovered by this
> patch, or something else is going on here but I have not been able to
> figure it out.
>
> Jan

Without CONFIG_DEBUG_RWSEMS, the only behavioral change of this patch is
to do an unconditional write of task_structure pointer into sem->owner
after acquiring the read lock in down_read(). Before this patch, it does
conditional write of 0x1 into sem->owner if it was not 0x1. The only
possible scenario that I can think of that can cause the soft lockup you
see is use-after-free of memory objects.

Cheers,
Longman