Re: [PATCH] rtmutex: ensure we wake up the top waiter

From: Thomas Gleixner
Date: Tue Jan 17 2023 - 19:37:26 EST


Wander!

On Tue, Jan 17 2023 at 14:26, Wander Lairson Costa wrote:
> In task_blocked_on_lock() we save the owner, release the wait_lock and
> call rt_mutex_adjust_prio_chain(). Before we acquire the wait_lock
> again, the owner may release the lock and deboost.

This does not make sense in several aspects:

1) Who is 'we'? You, me, someone else? None of us does anything of the
above.

https://www.kernel.org/doc/html/latest/process/maintainer-tip.html#changelog

2) What has task_blocked_on_lock() to do with the logic in
rt_mutex_adjust_prio_chain() which is called by other callsites
too?

3) If the owner releases the lock and deboosts then this has
absolutely nothing to do with the lock because the priority of a
the owner is determined by its own priority and the priority of the
top most waiter. If the owner releases the lock then it marks the
lock ownerless, wakes the top most waiter and deboosts itself. In
this owner deboost rt_mutex_adjust_prio_chain() is not involved at
all. Why?

Because the owner deboost does not affect the priority of the
waiters at all. It's the other way round: Waiter priority affects
the owner priority if the waiter priority is higher than the owner
priority.

> rt_mutex_adjust_prio_chain() acquires the wait_lock. In the requeue
> phase, waiter may be initially in the top of the queue, but after
> dequeued and requeued it may no longer be true.

That's related to your above argumentation in which way?

rt_mutex_adjust_prio_chain()

lock->wait_lock is held across the whole operation

prerequeue_top_waiter = rt_mutex_top_waiter(lock);

This saves the current top waiter before the dequeue()/enqueue()
sequence.

rt_mutex_dequeue(lock, waiter);
waiter_update_prio(waiter, task);
rt_mutex_enqueue(lock, waiter);

if (!rt_mutex_owner(lock)) {

This is the case where the lock has no owner, i.e. the previous owner
unlocked and the chainwalk cannot be continued.

Now the code checks whether the requeue changed the top waiter task:

if (prerequeue_top_waiter != rt_mutex_top_waiter(lock))

What can make this condition true?

1) @waiter is the new top waiter due to the requeue operation

2) @waiter is not longer the top waiter due to the requeue operation

So in both cases the new top waiter must be woken up so it can take over
the ownerless lock.

Here is where the code is buggy. It only considers case #1, but not
case #2, right?

So your patch is correct, but the explanation in your changelog has
absolutely nothing to do with the problem.

Why?

#2 is caused by a top waiter dropping out due to a signal or timeout
and thereby deboosting the whole lock chain.

So the relevant callchain which causes the problem originates from
remove_waiter()

See?

Thanks,

tglx