Re: [RESEND PATCH v4] sched: do not call __put_task_struct() on rt if pi_blocked_on is set

From: Luis Claudio R. Goncalves
Date: Tue Jun 17 2025 - 09:20:22 EST


On Tue, Jun 17, 2025 at 11:36:27AM +0200, Sebastian Andrzej Siewior wrote:
> On 2025-06-17 11:26:09 [+0200], Peter Zijlstra wrote:
> > On Fri, Jun 13, 2025 at 12:05:14PM -0300, Luis Claudio R. Goncalves wrote:
> > > With PREEMPT_RT enabled, some of the calls to put_task_struct() coming
> > > from rt_mutex_adjust_prio_chain() could happen in preemptible context and
> > > with a mutex enqueued. That could lead to this sequence:
> > >
> > > rt_mutex_adjust_prio_chain()
> > > put_task_struct()
> > > __put_task_struct()
> > > sched_ext_free()
> > > spin_lock_irqsave()
> > > rtlock_lock() ---> TRIGGERS
> > > lockdep_assert(!current->pi_blocked_on);
> > >
> > > Fix that by unconditionally resorting to the deferred call to
> > > __put_task_struct() if PREEMPT_RT is enabled.
> > >
> >
> > Should this have a Fixes: tag and go into /urgent?
>
> I would say so. I'm not sure what caused it. I think Luis said at some
> point that it is caused by a sched_ext case or I mixed it up with
> something. Luis?

You are correct, all the initial cases we observed were triggered at
sched_ext_free(). Later, Crystal Wood was able to pinpoint the real
problem, __put_task_struct() being called by an RT task with a mutex
enqueued. With that in mind we were able to identify other cases with
a similar cause.

> The other question I have, do we need to distinguish between PREEMPT_RT
> and not or can we do this unconditionally?

After you mentioned that idea in the v2 thread, I ran stress tests (LTP,
stress-ng, perf bench all in a tight loop, ...) and a few benchmarks, o
kernels with and without PREEMPT_RT enabled, with and without lockdep.
Everything worked fine, but due to the lack of a specific benchmark to
run, to ensure no penalty was added by the patch, I was not confident
enough to suggest the change.

Luis

> Sebastian
>
---end quoted text---