Re: Question on task_blocks_on_rt_mutex()

From: Paul E. McKenney
Date: Wed Sep 02 2020 - 11:54:17 EST


On Tue, Sep 01, 2020 at 06:51:28PM -0700, Davidlohr Bueso wrote:
> On Tue, 01 Sep 2020, Paul E. McKenney wrote:
>
> > And it appears that a default-niced CPU-bound SCHED_OTHER process is
> > not preempted by a newly awakened MAX_NICE SCHED_OTHER process. OK,
> > OK, I never waited for more than 10 minutes, but on my 2.2GHz that is
> > close enough to a hang for most people.
> >
> > Which means that the patch below prevents the hangs. And maybe does
> > other things as well, firing rcutorture up on it to check.
> >
> > But is this indefinite delay expected behavior?
> >
> > This reproduces for me on current mainline as follows:
> >
> > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --torture lock --duration 3 --configs LOCK05
> >
> > This hangs within a minute of boot on my setup. Here "hangs" is defined
> > as stopping the per-15-second console output of:
> > Writes: Total: 569906696 Max/Min: 81495031/63736508 Fail: 0
>
> Ok this doesn't seem to be related to lockless wake_qs then. fyi there have
> been missed wakeups in the past where wake_q_add() fails the cmpxchg because
> the task is already pending a wakeup leading to the actual wakeup ocurring
> before its corresponding wake_up_q(). This is why we have wake_q_add_safe().
> But for rtmutexes, because there is no lock stealing only top-waiter is awoken
> as well as try_to_take_rt_mutex() is done under the lock->wait_lock I was not
> seeing an actual race here.

This problem is avoided if stutter_wait() does the occasional sleep.
I would have expected preemption to take effect, but even setting the
kthreads in stutter_wait() to MAX_NICE doesn't help. The current fix
destroys intended instant-on nature of stutter_wait(), so the eventual
fix will need to use hrtimer-based sleeps or some such.

Thanx, Paul