Re: rtmutex, pi_blocked_on, and blk_flush_plug()

From: Sebastian Andrzej Siewior
Date: Mon Feb 20 2023 - 06:05:54 EST


On 2023-02-20 10:49:26 [+0100], Thomas Gleixner wrote:
> > The logic is different but the deadlock should be avoided:
> > - mutex_t and rw_semaphore invoke schedule() while blocking on a lock.
> > As part of schedule() sched_submit_work() is invoked.
> > This is the same in RT and !RT so I don't expect any dead lock since
> > the involved locks are the same.
>
> Huch?
>
> xlog_cil_commit()
> down_read(&cil->xc_ctx_lock)
> __rwbase_read_lock()
> __rt_mutex_slowlock()
> current->pi_blocked_on = ...
> schedule()
> __blk_flush_plug()
> dd_insert_requests()
> rt_spin_lock()
> WARN_ON(current->pi_blocked_on);
>
> So something like the below is required. But that might not cut it
> completely. wq_worker_sleeping() is fine, but I'm not convinced that
> io_wq_worker_sleeping() is safe. That needs some investigation.

Okay, so this makes sense.

> Thanks,
>
> tglx
> ---
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6666,6 +6666,9 @@ static inline void sched_submit_work(str
> */
> SCHED_WARN_ON(current->__state & TASK_RTLOCK_WAIT);
>
> + if (current->pi_blocked_on)
> + return;
> +

The ->pi_blocked_on field is set by __rwbase_read_lock() before
schedule() is invoked while blocking on the sleeping lock. By doing this
we avoid __blk_flush_plug() and as such will may deadlock because we are
going to sleep and made I/O progress earlier which is not globally
visibly but might be (s/might be/is/ in the deadlock case) expected by
the owner of the lock.

We could trylock and if this fails, flush and do the proper lock.
This would ensure that we set pi_blocked_on after we flushed.

> /*
> * If we are going to sleep and we have plugged IO queued,
> * make sure to submit it to avoid deadlocks.

Sebastian