[PATCH] locking/rtmutex: Flush the plug before entering the slowpath.

From: Sebastian Andrzej Siewior
Date: Wed Mar 22 2023 - 12:27:51 EST


blk_flush_plug() is invoked on schedule() to flush out the IO progress
that has been made so far so that it is globally visible. This is
important to avoid deadlocks because a lock owner can wait for IO.
Therefore the IO must be first flushed before a thread can block on a
lock.

The plug flush routine can acquire a sleeping lock which is contended.
Blocking on a lock requires an assignment to task_struct::pi_blocked_on.
If blk_flush_plug() is invoked from the slow path on schedule() then the
variable is already set and will be overwritten by the lock in
blk_flush_plug().
Therefore it is needed to invoke blk_flush_plug() (and block on
potential locks in the process) before the blocking on the actual lock.

Invoke blk_flush_plug() before blocking on a sleeping lock. The
PREEMPT_RT only sleeping locks (spinlock_t and rwlock_t) are excluded
because their slow path does not invoke blk_flush_plug().

Fixes: e17ba59b7e8e1 ("locking/rtmutex: Guard regular sleeping locks specific functions")
Reported-by: Crystal Wood <swood@xxxxxxxxxx>
Link: https://lore.kernel.org/4b4ab374d3e24e6ea8df5cadc4297619a6d945af.camel@xxxxxxxxxx
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
---
On 2023-02-20 19:21:51 [+0100], Thomas Gleixner wrote:
>
> This still leaves the problem vs. io_wq_worker_sleeping() and it's
> running() counterpart after schedule().

io_wq_worker_sleeping() has a kfree() so it probably should be moved,
too.
io_wq_worker_running() is a OR and INC and is fine.

> Aside of that for CONFIG_DEBUG_RT_MUTEXES=y builds it flushes on every
> lock operation whether the lock is contended or not.

For mutex & ww_mutex operations. rwsem is not affected by
CONFIG_DEBUG_RT_MUTEXES. As for mutex it could be mitigated by invoking
try_to_take_rt_mutex() before blk_flush_plug().

kernel/locking/rtmutex.c | 7 +++++++
kernel/locking/rwbase_rt.c | 8 ++++++++
kernel/locking/ww_rt_mutex.c | 5 +++++
3 files changed, 20 insertions(+)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 728f434de2bbf..c1bc2cb1522cb 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -23,6 +23,7 @@
#include <linux/sched/rt.h>
#include <linux/sched/wake_q.h>
#include <linux/ww_mutex.h>
+#include <linux/blkdev.h>

#include <trace/events/lock.h>

@@ -1700,6 +1701,12 @@ static __always_inline int __rt_mutex_lock(struct rt_mutex_base *lock,
if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
return 0;

+ /*
+ * If we are going to sleep and we have plugged IO queued, make sure to
+ * submit it to avoid deadlocks.
+ */
+ blk_flush_plug(current->plug, true);
+
return rt_mutex_slowlock(lock, NULL, state);
}
#endif /* RT_MUTEX_BUILD_MUTEX */
diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
index c201aadb93017..70c08ec4ad8af 100644
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -143,6 +143,14 @@ static __always_inline int rwbase_read_lock(struct rwbase_rt *rwb,
if (rwbase_read_trylock(rwb))
return 0;

+ if (state != TASK_RTLOCK_WAIT) {
+ /*
+ * If we are going to sleep and we have plugged IO queued,
+ * make sure to submit it to avoid deadlocks.
+ */
+ blk_flush_plug(current->plug, true);
+ }
+
return __rwbase_read_lock(rwb, state);
}

diff --git a/kernel/locking/ww_rt_mutex.c b/kernel/locking/ww_rt_mutex.c
index d1473c624105c..472e3622abf09 100644
--- a/kernel/locking/ww_rt_mutex.c
+++ b/kernel/locking/ww_rt_mutex.c
@@ -67,6 +67,11 @@ __ww_rt_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ww_ctx,
ww_mutex_set_context_fastpath(lock, ww_ctx);
return 0;
}
+ /*
+ * If we are going to sleep and we have plugged IO queued, make sure to
+ * submit it to avoid deadlocks.
+ */
+ blk_flush_plug(current->plug, true);

ret = rt_mutex_slowlock(&rtm->rtmutex, ww_ctx, state);

--
2.40.0