Re: [PATCH v3] locking/rtmutex: Limit # of lock stealing for non-RT waiters

From: Waiman Long
Date: Thu Jul 07 2022 - 14:45:29 EST


On 7/7/22 14:22, Boqun Feng wrote:
On Wed, Jul 06, 2022 at 10:03:10AM -0400, Waiman Long wrote:
On 7/6/22 09:59, Waiman Long wrote:
Commit 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock
stealing") allows unlimited number of lock stealing's for non-RT
tasks. That can lead to lock starvation of non-RT top waiter tasks if
there is a constant incoming stream of non-RT lockers. This can cause
rcu_preempt self-detected stall or even task lockup in PREEMPT_RT kernel.
For example,

[77107.424943] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 1249.921363] INFO: task systemd:2178 blocked for more than 622 seconds.

Avoiding this problem and ensuring forward progress by limiting the
number of times that a lock can be stolen from each waiter. This patch
sets a threshold of 32. That number is arbitrary and can be changed
if needed.

Fixes: 48eb3f4fcfd3 ("locking/rtmutex: Implement equal priority lock stealing")
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
---
kernel/locking/rtmutex.c | 9 ++++++---
kernel/locking/rtmutex_common.h | 8 ++++++++
2 files changed, 14 insertions(+), 3 deletions(-)

[v3: Increase threshold to 32 and add rcu_preempt self-detected stall]
Note that I decided to increase the threshold to 32 from 10 to reduce the
potential performance impact of this change, if any. We also found out that
this patch can fix some of the rcu_preempt self-detected stall problems that
we saw with the PREEMPT_RT kernel. So I added that information in the patch
description.

Have you considered (and tested) whether we can set the threshold
directly proportional to nr_cpu_ids? Because IIUC, the favorable case
for lock stealing is that every CPU gets a chance to steal once. If one
CPU can steal twice, 1) either there is a context switch between two
tasks, which costs similarly as waking up the waiter, or 2) a task drops
and re-graps a lock, which means the task wants to yield to other
waiters of the lock.

There is no inherent restriction on not allowing the same cpu stealing the lock twice or more. With rtmutex, the top waiter may be sleeping and the wakeup latency can be considerable. By allowing another ready lock waiter to steal the lock for productive use, it can improve system throughput. There is no fairness in lock stealing and I don't believe it is a worthwhile goal to allow each cpu to steal the lock once. It will just complicate the code.

On the other hand, unlimited lock stealing is bad and we have a put a limit somehow to ensure forward progress.

Cheers,
Longman