Re: [PATCH RFC] rtmutex: Fix BUG_ON at kernel/locking/rtmutex.c:1331!

From: yuxin.ye
Date: Fri Jul 22 2022 - 04:02:14 EST


On Thu, Jul 21, 2022 at 02:14:16PM -0400, Waiman Long wrote:
>
> On 7/21/22 03:17, yuxin.ye wrote:
> > On Wed, Jul 20, 2022 at 10:25:17PM -0400, Waiman Long wrote:
> > > On 7/20/22 03:28, yuxin.ye wrote:
> > > > before rt_mutex_adjust_prio_chain(),unlock lock->wait_lock will cause
> > > > BUG_ON at kernel/locking/rtmutex.c:1331:
> > > The current upstream kernel/locking/rtmutex.c has no BUG_ON() call. Which
> > > version of the kernel are you using?
> > >
> > > Cheers,
> > > Longman
> > >
> > The Linux version is 5.10.
> > The upstream has indeed removed the BUG_ON, But in rt_mutex_adjust_prio_chain()
> > it is still possible to have a thread is blocked by two locks. Can this situation
> > be ignored without BUG_ON?
>
> No. However, we don't remove the lock like what you do with your patch. It
> will corrupt the data if multiple CPUs are allowed to run
> rt_mutex_adjust_prio_chain() for the same rt_mutex simultaneously. You need
> to find a way to fix the underlying problem.
>
> BTW, I still can't see a BUG_ON at line 1331 of rtmutex.c with a v5.10
> kernel. Does your source tree have some out-of-tree patches that modifies
> rtmutex?
>
> Cheers,
> Longman
>

Yes, I'm sorry I overlooked that earlier. We applied the RT patch,and
the BUG_ON are also introduced by these patches.

Back to the question, I think remove the wait_lock unlock before
rt_mutex_adjust_prio_chain() is more likely to protect some data. The
commont on task_blocks_on_rt_mutex() indicates that must be called with
wait_lock held, but it unlock before call rt_mutex_adjust_prio_chain().
This may cause the owner thread to unlock the orig_lock and exit the
thead. Finally, when calling put_task_struct(owner) in
rt_mutex_adjust_prio_chain(), the thread is blocked by another lock that
is deeply hidden.

Actully, I'm not sure why rt_mutex_adjust_prio_chain()
dosen't need wait_lock protection.

Thanks again.