Re: [PATCH v5] kernel/fork: beware of __put_task_struct calling context

From: Wander Lairson Costa
Date: Fri Feb 24 2023 - 12:05:48 EST


On Wed, Feb 15, 2023 at 12:42:46PM +0100, Sebastian Andrzej Siewior wrote:
> On 2023-02-13 09:13:55 [-0300], Wander Lairson Costa wrote:
>
> > > > diff --git a/kernel/fork.c b/kernel/fork.c
> > > > index 9f7fe3541897..9bf30c725ed8 100644
> > > > --- a/kernel/fork.c
> > > > +++ b/kernel/fork.c
> > > > @@ -857,6 +857,37 @@ void __put_task_struct(struct task_struct *tsk)
> > > > sched_core_free(tsk);
> > > > free_task(tsk);
> > > > }
> > > > +
> > > > +static void __put_task_struct_rcu(struct rcu_head *rhp)
> > > > +{
> > > > + struct task_struct *task = container_of(rhp, struct task_struct, rcu);
> > > > +
> > > > + ___put_task_struct(task);
> > > > +}
> > > > +
> > > > +void __put_task_struct(struct task_struct *tsk)
> > > > +{
> > > > + if (IS_ENABLED(CONFIG_PREEMPT_RT) && (!preemptible() || !in_task()))
> > >
> > > No. If you do this on non-RT kernel with CONFIG_PROVE_RAW_LOCK_NESTING
> > > then it will complain. And why do we have in_task() here?
> > >
> >
> > Initially I thought you were saying it would cause a build failure, but
> > I built the kernel successfully with CONFIG_PROVE_RAW_LOCK_NESTING.
> > If it is a non-RT kernel, I understand the optimizer will vanish with
> > the `if` clause. Would mind further explaining the conflict with
> > CONFIG_PROVE_RAW_LOCK_NESTING?
>
> Documentation/locking/locktypes.rst explains the individual lock types
> we have in the kernel and how you should nest them. In short,
>
> mutex_t -> spinlock_t -> raw_spinlock_t
>
> You nest/ acquire them left to right, i.e. first the mutex_t, last
> raw_spinlock_t. This works always. If you leave PREEMPT_RT out of the
> picture then
> raw_spinlock_t -> spinlock_t
> and
> spinlock_t -> raw_spinlock_t
>
> make no difference because the underlying lock structure is the same,
> the behaviour is the same. It only causes warning or a boom once
> PREEMPT_RT is enabled.
> CONFIG_PROVE_RAW_LOCK_NESTING performs exactly this kind of
> verification so you can see on a !PREEMPT_RT kernel if there is a
> locking chain (or nesting) that would not be okay on PREEMPT_RT.
>
> In this case, at the time you do __put_task_struct() the sched-RQ lock
> is held which is a raw_spinlock_t. Later in __put_task_struct() it will
> free memory (or do something else) requiring a spinlock_t which would do
> the nesting
> raw_spinlock_t -> spinlock_t
>
> which is invalid and so lockdep should yell here.

Thanks for the detailed explanation!

>
> > The `!in_task()` call is to test if we are in interrupt context.
>
> I am aware of this but here in terms of PREEMPT_RT it doesn't matter.
> It excluded the hardirq context which is the important one but this also
> happens with preemptible(). It additionally excludes the "serving"
> softirq context which is fine because it is preemtible on PREEMPT_RT.
>

Indeed, you are write, the !in_task() is uneeded.

> > > If Oleg does not want the unconditional RCU then I would prefer an
> > > explicit put task which delays it to RCU for the few users that need it.
> > >
> >
> > Do you mean like the approach in v2[1]? I believe to spot all possible
> > problematic scenarios, would should add
>
> Yes, an explicit function because you know the context in which put_.*()
> is invoked. It wasn't audited by the time it was added, it is not
> "regular" case.
>
> > ```
> > if (IS_ENABLED(CONFIG_PREEMPT_RT))
> > might_sleep();
> > ```
> >
> > to `put_task_struct()`.
>
> This only works on PREEMPT_RT and should be enough to spot some of the
> offender we have right now. It might also trigger if task::state is
> changed (not TASK_RUNNING) and it should be fine. Therefore I would
> suggest to use rtlock_might_resched() for testing which is in
> kernel/locking/spinlock_rt.c
> but you get the idea.
>
> Longterm, something like the diff at the bottom might compile and will
> show raw_spinlock_t -> spinlock_t nesting with
> CONFIG_PROVE_RAW_LOCK_NESTING. We won't catch explicit
> preempt_disable(), local_irq_disable() users but _should_ be enough and
> it would have warned us in this case because:
> - the scheduler acquires a raw_spinlock_t
> - the hrtimer has an check for this in lockdep_hrtimer_enter() to
> distinguish between timers which are "regular" and those which
> explicitly ask for the hardirq context.
>
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> index 357e0068497c1..eedbd50eb5df3 100644
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -113,14 +113,18 @@ static inline struct task_struct *get_task_struct(struct task_struct *t)
>
> extern void __put_task_struct(struct task_struct *t);
>
> +extern spinlock_t task_put_lock;
> +
> static inline void put_task_struct(struct task_struct *t)
> {
> + might_lock(&task_put_lock);
> if (refcount_dec_and_test(&t->usage))
> __put_task_struct(t);
> }
>
> static inline void put_task_struct_many(struct task_struct *t, int nr)
> {
> + might_lock(&task_put_lock);
> if (refcount_sub_and_test(nr, &t->usage))
> __put_task_struct(t);
> }
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 9f7fe35418978..2f9c09bc22bdb 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -840,6 +840,8 @@ static inline void put_signal_struct(struct signal_struct *sig)
> free_signal_struct(sig);
> }
>
> +DEFINE_SPINLOCK(task_put_lock);
> +
> void __put_task_struct(struct task_struct *tsk)
> {
> WARN_ON(!tsk->exit_state);
>

I tried this, but it doesn't give the splat in !PREEMPT_RT. But IIUC,
CONFIG_PROVE_RAW_LOCK_NESTING will only work if we hold a raw_spinlock_t
and try to acquire a spin lock. Does it check irq context as well?