Re: [PATCH] sched/core: Schedule new worker even if PI-blocked

From: Sebastian Andrzej Siewior
Date: Tue Aug 20 2019 - 11:54:07 EST


On 2019-08-20 17:20:25 [+0200], Peter Zijlstra wrote:
> > There isc RCU (boosting) and futex. I'm sceptical about the i2c usersâ
>
> Well, yes, I too was/am sceptical, but it was tglx who twisted my arm
> and said the i2c people were right and rt_mutex is/should-be a generic
> usable interface.

I don't mind the generic interface I just find the use-case odd. So by
now rtmutex is used by i2c core and not a single driver like it the case
the last time I looked at it. But still, why is it (PI-boosting)
important for I2C to use it and not for other subsystems? Moving onâ

> > > > --- a/kernel/sched/core.c
> > > > +++ b/kernel/sched/core.c
> > > > @@ -3945,7 +3945,7 @@ void __noreturn do_task_dead(void)
> > > >
> > > > static inline void sched_submit_work(struct task_struct *tsk)
> > > > {
> > > > - if (!tsk->state || tsk_is_pi_blocked(tsk))
> > > > + if (!tsk->state)
> > > > return;
> > > >
> > > > /*
>
> So this part actually makes rt_mutex less special and is good.
>
> > > > @@ -3961,6 +3961,9 @@ static inline void sched_submit_work(str
> > > > preempt_enable_no_resched();
> > > > }
> > > >
> > > > + if (tsk_is_pi_blocked(tsk))
> > > > + return;
> > > > +
> > > > /*
> > > > * If we are going to sleep and we have plugged IO queued,
> > > > * make sure to submit it to avoid deadlocks.
> > >
> > > What do we need that clause for? Why is pi_blocked special _at_all_?
> >
> > so !RT the scheduler does nothing special if a task blocks on sleeping
> > lock.
> > If I remember correctly then blk_schedule_flush_plug() is the problem.
> > It may require a lock which is held by the task.
> > It may hold A and wait for B while another task has B and waits for A.
> > If my memory does bot betray me then ext+jbd can lockup without this.
>
> And am I right in thinking that that, again, is specific to the
> sleeping-spinlocks from PREEMPT_RT? Is there really nothing else that
> identifies those more specifically? It's been a while since I looked at
> them.

Not really. I hacked "int sleeping_lock" into task_struct which is
incremented each time a "sleeping lock" version of rtmutex is requested.
We have two users as of now:
- RCU, which checks if we schedule() while holding rcu_read_lock() which
is okay if it is a sleeping lock.

- NOHZ's pending softirq detection while going to idle. It is possible
that "ksoftirqd" and "current" are blocked on locks and the CPU goes
to idle (because nothing else is runnable) with pending softirqs.

I wanted to let rtmutex invoke another schedule() function in case of a
sleeping lock to avoid the RCU warning. This would avoid incrementing
"sleeping_lock" in the fast path. But then I had no idea what to do with
the NOHZ thing.

> Also, I suppose it would be really good to put that in a comment.
So, what does that mean for that patch. According to my inbox it has
applied to an "urgent" branch. Do I resubmit the whole thing or just a
comment on top?

Sebastian