Re: Pinning down a blocked task to extract diagnostics

From: Paul E. McKenney
Date: Thu Mar 05 2020 - 09:22:48 EST


On Thu, Mar 05, 2020 at 09:13:37AM +0100, Peter Zijlstra wrote:
> On Thu, Mar 05, 2020 at 09:07:55AM +0100, Peter Zijlstra wrote:
> > On Wed, Mar 04, 2020 at 04:50:49PM -0800, Paul E. McKenney wrote:
> > > Hello!
> > >
> > > Suppose that I need to extract diagnostics information from a blocked
> > > task, but that I absolutely cannot tolerate this task awakening in the
> > > midst of this extraction process. Is the following code the right way
> > > to make this work given a task "t"?
> > >
> > > raw_spin_lock_irq(&t->pi_lock);
> > > if (t->on_rq) {
> > > /* Task no longer blocked, so ignore it. */
> > > } else {
> > > /* Extract consistent diagnostic information. */
> > > }
> > > raw_spin_unlock_irq(&t->pi_lock);
> > >
> > > It looks like all the wakeup paths acquire ->pi_lock, but I figured I
> > > should actually ask...
> >
> > Close, the thing pi_lock actually guards is the t->state transition *to*
> > TASK_WAKING/TASK_RUNNING, so something like this:
>
> Almost, we must indeed also check ->on_rq, otherwise it might change the
> state back itself.
>
> >
> > raw_spin_lock_irq(&t->pi_lock);
> > switch (t->state) {
> > case TASK_RUNNING:
> > case TASK_WAKING:
> > /* ignore */
> > break;
> >
> > default:
> if (t->on_rq)
> break;
>
> > /* Extract consistent diagnostic information. */
> > break;
> > }
> > raw_spin_unlock_irq(&t->pi_lock);
> >
> > ought to work. But if you're going to do this, please add a reference to
> > that code in a comment on top of try_to_wake_up(), such that we can
> > later find all the code that relies on this.

How about if I add something like this, located right by try_to_wake_up()?

bool try_to_keep_sleeping(struct task_struct *t)
{
raw_spin_lock_irq(&t->pi_lock);
switch (t->state) {
case TASK_RUNNING:
case TASK_WAKING:
raw_spin_unlock_irq(&t->pi_lock);
return false;

default:
if (t->on_rq) {
raw_spin_unlock_irq(&t->pi_lock);
return false;
}

/* OK to extract consistent diagnostic information. */
return true;
}
/* NOTREACHED */
}

Then a use might look like this:

if (try_to_keep_sleeping(t))
/* Extract consistent diagnostic information. */
raw_spin_unlock_irq(&t->pi_lock);
} else {
/* Woo-hoo! It started running again!!! */
}

Is there a better way to approach this?

Thanx, Paul