Re: [PATCH] rtmutex: ensure we wake up the top waiter

From: Wander Lairson Costa
Date: Tue Jan 31 2023 - 12:47:24 EST


On Thu, Jan 19, 2023 at 5:54 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> Wander!
>
> On Wed, Jan 18 2023 at 15:49, Wander Lairson Costa wrote:
> > On Tue, Jan 17, 2023 at 9:05 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> >> On Tue, Jan 17 2023 at 14:26, Wander Lairson Costa wrote:
> >> > rt_mutex_adjust_prio_chain() acquires the wait_lock. In the requeue
> >> > phase, waiter may be initially in the top of the queue, but after
> >> > dequeued and requeued it may no longer be true.
> >>
> >> That's related to your above argumentation in which way?
> >>
> >
> > I think I made the mistake of not explicitly saying at least three
> > tasks are involved:
> >
> > - A Task T1 that currently holds the mutex
> > - A Task T2 that is the top waiter
> > - A Task T3 that changes the top waiter
> >
> > T3 tries to acquire the mutex, but as T1 holds it, it calls
> > task_blocked_on_lock() and saves the owner. It eventually calls
> > rt_mutex_adjust_prio_chain(), but it releases the wait lock before
> > doing so. This opens a window for T1 to release the mutex and wake up
> > T2. Before T2 runs, T3 acquires the wait lock again inside
> > rt_mutex_adjust_prio_chain(). If the "dequeue/requeue" piece of code
> > changes the top waiter, then 1) When T2 runs, it will verify that it
> > is no longer the top waiter and comes back to sleep 2) As you observed
> > below, the waiter doesn't point to the top waiter and, therefore, it
> > will wake up the wrong task.
>
> This is still confusing as hell because the wait locks you are talking
> about belong to different rtmutexes. So there is no drops wait lock and
> reacquires wait lock window.
>
> There must be (multiple) lock chains involved, but I couldn't figure out
> yet what the actaul scenario is in the case of a pure rt_spinlock clock
> chain:
>

Sorry, it took so long to reply. I didn't have the traces anymore and
had to regenerate them. You spotted an error in my analysis, then I
had to start over.

> > Another piece of information I forgot: I spotted the bug in the
> > spinlock_rt, which uses a rtmutex under the hood. It has a different
> > code path in the lock scenario, and there is no call to
> > remove_waiter() (or I am missing something).
>
> Correct. But this still might be a lock chain issue where a non
> rt_spinlock which allows early removal.
>
> > Anyway, you summed it up pretty well here: "@waiter is no longer the
> > top waiter due to the requeue operation". I tried (and failed) to
> > explain the call chain that ends up in the buggy scenario, but now I
> > think I should just describe the fundamental problem (the waiter
> > doesn't point to the top waiter).
>
> You really want to provide the information WHY this case can happen at
> all. If it's not the removal case and related to some other obscure lock
> chain problem, then we really need to understand the scenario because
> that lets us analyze whether there are other holes we did not think
> about yet.
>
> If you have traces which show the sequence of lock events leading to
> this problem, then you should be able to decode the scenario. If you
> fail to extract the information, then please provide the traces so we
> can stare at them.
>

Here we go:

Let L1 and L2 be two spinlocks.

Let T1 be a task holding L1 and blocked on L2. T1, currently, is the top
waiter of L2.

Let T2 be the task holding L2.

Let T3 be a task trying to acquire L1.

The following events will lead to a state in which the wait queue of L2
isn't empty but nobody holds it.

T1 T2 T3
spin_lock(L1)

raw_spin_lock(L1->wait_lock)

rtlock_slowlock_locked(L1)

task_blocks_on_rt_mutex(L1, T3)

orig_waiter->lock = L1

orig_waiter->task = T3

raw_spin_unlock(L1->wait_lock)

rt_mutex_adjust_prio_chain(T1, L1, L2, orig_waiter, T3)

spin_unlock(L2)
rt_mutex_slowunlock(L2)
raw_spin_lock(L2->wait_lock)
wakeup(T1)
raw_spin_unlock(L2->wait_lock)

waiter = T1->pi_blocked_on

waiter == rt_mutex_top_waiter(L2)

waiter->task == T1

raw_spin_lock(L2->wait_lock)

dequeue(L2, waiter)

update_prio(waiter, T1)

enqueue(L2, waiter)

waiter != rt_mutex_top_waiter(L2)

L2->owner == NULL

wakeup(T1)

raw_spin_unlock(L2->wait_lock)
T1 wakes up
T1 != top_waiter(L2)
schedule_rtlock()

What I observed is that T1 deadline is updated before the call to
update_prio(), and the new deadline removes it from the the top waiter
in L2 wait queue.

Does that make sense?