Re: Pinning down a blocked task to extract diagnostics

From: Steven Rostedt
Date: Thu Mar 05 2020 - 09:28:50 EST


On Thu, 5 Mar 2020 06:22:45 -0800
"Paul E. McKenney" <paulmck@xxxxxxxxxx> wrote:

> On Thu, Mar 05, 2020 at 09:13:37AM +0100, Peter Zijlstra wrote:
> > On Thu, Mar 05, 2020 at 09:07:55AM +0100, Peter Zijlstra wrote:
> > > On Wed, Mar 04, 2020 at 04:50:49PM -0800, Paul E. McKenney wrote:
> > > > Hello!
> > > >
> > > > Suppose that I need to extract diagnostics information from a blocked
> > > > task, but that I absolutely cannot tolerate this task awakening in the
> > > > midst of this extraction process. Is the following code the right way
> > > > to make this work given a task "t"?
> > > >
> > > > raw_spin_lock_irq(&t->pi_lock);
> > > > if (t->on_rq) {
> > > > /* Task no longer blocked, so ignore it. */
> > > > } else {
> > > > /* Extract consistent diagnostic information. */
> > > > }
> > > > raw_spin_unlock_irq(&t->pi_lock);
> > > >
> > > > It looks like all the wakeup paths acquire ->pi_lock, but I figured I
> > > > should actually ask...
> > >
> > > Close, the thing pi_lock actually guards is the t->state transition *to*
> > > TASK_WAKING/TASK_RUNNING, so something like this:
> >
> > Almost, we must indeed also check ->on_rq, otherwise it might change the
> > state back itself.
> >
> > >
> > > raw_spin_lock_irq(&t->pi_lock);
> > > switch (t->state) {
> > > case TASK_RUNNING:
> > > case TASK_WAKING:
> > > /* ignore */
> > > break;
> > >
> > > default:
> > if (t->on_rq)
> > break;
> >
> > > /* Extract consistent diagnostic information. */
> > > break;
> > > }
> > > raw_spin_unlock_irq(&t->pi_lock);
> > >
> > > ought to work. But if you're going to do this, please add a reference to
> > > that code in a comment on top of try_to_wake_up(), such that we can
> > > later find all the code that relies on this.
>
> How about if I add something like this, located right by try_to_wake_up()?
>
> bool try_to_keep_sleeping(struct task_struct *t)
> {
> raw_spin_lock_irq(&t->pi_lock);
> switch (t->state) {
> case TASK_RUNNING:
> case TASK_WAKING:
> raw_spin_unlock_irq(&t->pi_lock);
> return false;
>
> default:
> if (t->on_rq) {

Somehow I think there still needs to be a read barrier before the test to
on_rq.

> raw_spin_unlock_irq(&t->pi_lock);
> return false;
> }
>
> /* OK to extract consistent diagnostic information. */
> return true;
> }
> /* NOTREACHED */
> }
>
> Then a use might look like this:
>
> if (try_to_keep_sleeping(t))
> /* Extract consistent diagnostic information. */
> raw_spin_unlock_irq(&t->pi_lock);

Perhaps we should have a allow_awake(t) to match it?

allow_awake(t);

Where we have:

static inline allow_awake(struct task_struct *t)
{
raw_spin_unlock_irq(&t->pi_lock);
}

-- Steve?

> } else {
> /* Woo-hoo! It started running again!!! */
> }
>
> Is there a better way to approach this?
>
> Thanx, Paul