Re: rtmutex, pi_blocked_on, and blk_flush_plug()

From: Sebastian Andrzej Siewior
Date: Mon Feb 20 2023 - 06:42:30 EST


On 2023-02-20 12:04:56 [+0100], To Thomas Gleixner wrote:
> The ->pi_blocked_on field is set by __rwbase_read_lock() before
> schedule() is invoked while blocking on the sleeping lock. By doing this
> we avoid __blk_flush_plug() and as such will may deadlock because we are
> going to sleep and made I/O progress earlier which is not globally
> visibly but might be (s/might be/is/ in the deadlock case) expected by
> the owner of the lock.
>
> We could trylock and if this fails, flush and do the proper lock.
> This would ensure that we set pi_blocked_on after we flushed.

Something like the diff below takes down_read(), down_write() into
account. read_lock()/ write_lock() is excluded via the state check.
mutex_t is missing. It needs to be flushed before the pi_blocked_on is
assigned, before the wait lock is acquired:

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 728f434de2bbf..95731d0c9e87f 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1700,6 +1700,13 @@ static __always_inline int __rt_mutex_lock(struct rt_mutex_base *lock,
if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current)))
return 0;

+ if (state != TASK_RTLOCK_WAIT) {
+ /*
+ * If we are going to sleep and we have plugged IO queued,
+ * make sure to submit it to avoid deadlocks.
+ */
+ blk_flush_plug(tsk->plug, true);
+ }
return rt_mutex_slowlock(lock, NULL, state);
}
#endif /* RT_MUTEX_BUILD_MUTEX */
diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
index c201aadb93017..6c6c88a2d9228 100644
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -143,6 +143,14 @@ static __always_inline int rwbase_read_lock(struct rwbase_rt *rwb,
if (rwbase_read_trylock(rwb))
return 0;

+ if (state != TASK_RTLOCK_WAIT) {
+ /*
+ * If we are going to sleep and we have plugged IO queued,
+ * make sure to submit it to avoid deadlocks.
+ */
+ blk_flush_plug(tsk->plug, true);
+ }
+
return __rwbase_read_lock(rwb, state);
}

Sebastian