Re: rcu: endless stalls

From: Paul E. McKenney
Date: Thu Jun 14 2012 - 12:51:32 EST


On Thu, Jun 14, 2012 at 09:45:32AM +0200, Mike Galbraith wrote:
> On Wed, 2012-06-13 at 09:12 +0200, Mike Galbraith wrote:
> > On Wed, 2012-06-13 at 07:56 +0200, Mike Galbraith wrote:
> >
> > > Question remains though. Maybe the box hit some other problem that led
> > > to death by RCU gripage, but the info I received indicated the box was
> > > in the midst of a major spin-fest.
> >
> > To (maybe) speak more clearly, since it's a mutex like any other mutex
> > that loads of CPUs can hit if you've got loads of CPUs, did huge box
> > driver do something that we don't expect so many CPUs to be doing, thus
> > instigate simultaneous exit trouble (ie shoot self in foot), or did that
> > mutex addition create the exit trouble which box appeared to be having?
>
> Crickets chirping.. I know what _that_ means: "tsk tsk, you dummy" :)

In my case, it means that I suspect you would rather me continue working
on my series of patches to further reduce RCU grace-period-initialization
latencies on large systems than worry about this patch.

If I understand correctly, your patch did what you wanted in the
situation at hand. I have some quibbles, please see below.

> I suspected that would happen, but asked anyway because I couldn't
> imagine even 4096 CPUs getting tangled up for an _eternity_ trying to go
> to sleep, but the lock which landed after 32-stable where these beasts
> earn their daily fuel rods was splattered all over the event. Oh well.
>
> So, I can forget that and just make the thing not gripe itself to death
> should a stall for whatever reason be encountered again.
>
> Rather than mucking about with rcu_cpu_stall_suppress, how about adjust
> timeout as you proceed, and block report functions? That way, there's
> no fiddling with things used elsewhere, and it shouldn't matter how
> badly console is being be hammered, you get a full report, and maybe
> even only one.

That sounds like a very good interim approach to me!

> Hm, maybe I should forget hoping to keep check_cpu_stall() happy too,
> and only silently ignore it when busy.

But this is a good way to accumulate a variety of stalls, so not
recommended.

Thanx, Paul

> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 0da7b88..e9dd654 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -727,24 +727,29 @@ static void record_gp_stall_check_time(struct rcu_state *rsp)
> rsp->jiffies_stall = jiffies + jiffies_till_stall_check();
> }
>
> +int rcu_stall_report_in_progress;
> +
> static void print_other_cpu_stall(struct rcu_state *rsp)
> {
> int cpu;
> long delta;
> unsigned long flags;
> int ndetected;
> - struct rcu_node *rnp = rcu_get_root(rsp);
> + struct rcu_node *root = rcu_get_root(rsp);

s/root/rnp_root/, please -- consistency with other places in the code.
But see below.

> + struct rcu_node *rnp;
>
> /* Only let one CPU complain about others per time interval. */
>
> - raw_spin_lock_irqsave(&rnp->lock, flags);
> + raw_spin_lock_irqsave(&root->lock, flags);

On a 4096-CPU system, I bet that this needs to be trylock, with
a bare "return" on failure to acquire the lock.

Of course, that fails if someone is holding this lock for some other
reason. So I believe that you need a separate ->stalllock in the
rcu_state structure for this purpose. You won't need to disable irqs
when acquiring the lock.

> delta = jiffies - rsp->jiffies_stall;
> - if (delta < RCU_STALL_RAT_DELAY || !rcu_gp_in_progress(rsp)) {
> - raw_spin_unlock_irqrestore(&rnp->lock, flags);
> + if (delta < RCU_STALL_RAT_DELAY || !rcu_gp_in_progress(rsp) ||
> + rcu_stall_report_in_progress) {

If you conditionally acquire the new ->stalllock, you shouldn't need
this added check.

> + raw_spin_unlock_irqrestore(&root->lock, flags);
> return;
> }
> rsp->jiffies_stall = jiffies + 3 * jiffies_till_stall_check() + 3;
> - raw_spin_unlock_irqrestore(&rnp->lock, flags);
> + rcu_stall_report_in_progress++;

And with the new ->stalllock, rcu_stall_report_in_progress can go away.

> + raw_spin_unlock_irqrestore(&root->lock, flags);
>
> /*
> * OK, time to rat on our buddy...
> @@ -765,16 +770,23 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
> print_cpu_stall_info(rsp, rnp->grplo + cpu);
> ndetected++;
> }
> +
> + /*
> + * Push the timeout back as we go. With a slow serial
> + * console on a large machine, this may take a while.
> + */
> + raw_spin_lock_irqsave(&root->lock, flags);
> + rsp->jiffies_stall = jiffies + 3 * jiffies_till_stall_check() + 3;
> + raw_spin_unlock_irqrestore(&root->lock, flags);

And the separate ->stalllock should make this unnecessary as well.
However, updating rsp->jiffies_stall periodically is a good idea
in order to decrease cache thrashing on ->stalllock.

> }
>
> /*
> * Now rat on any tasks that got kicked up to the root rcu_node
> * due to CPU offlining.
> */
> - rnp = rcu_get_root(rsp);
> - raw_spin_lock_irqsave(&rnp->lock, flags);
> - ndetected = rcu_print_task_stall(rnp);
> - raw_spin_unlock_irqrestore(&rnp->lock, flags);
> + raw_spin_lock_irqsave(&root->lock, flags);
> + ndetected = rcu_print_task_stall(root);
> + raw_spin_unlock_irqrestore(&root->lock, flags);

And the separate lock makes this change unnecessary, also.

> print_cpu_stall_info_end();
> printk(KERN_CONT "(detected by %d, t=%ld jiffies)\n",
> @@ -784,6 +796,10 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
> else if (!trigger_all_cpu_backtrace())
> dump_stack();
>
> + raw_spin_lock_irqsave(&root->lock, flags);
> + rcu_stall_report_in_progress--;
> + raw_spin_unlock_irqrestore(&root->lock, flags);

Ditto here.

> +
> /* If so configured, complain about tasks blocking the grace period. */
>
> rcu_print_detail_task_stall(rsp);
> @@ -796,6 +812,17 @@ static void print_cpu_stall(struct rcu_state *rsp)
> unsigned long flags;
> struct rcu_node *rnp = rcu_get_root(rsp);
>
> + raw_spin_lock_irqsave(&rnp->lock, flags);
> + if (rcu_stall_report_in_progress) {
> + raw_spin_unlock_irqrestore(&rnp->lock, flags);
> + return;
> + }
> +
> + /* Reset timeout, dump_stack() may take a while on large machines. */
> + rsp->jiffies_stall = jiffies + 3 * jiffies_till_stall_check() + 3;
> + rcu_stall_report_in_progress++;
> + raw_spin_unlock_irqrestore(&rnp->lock, flags);

And do trylock (without disabling irqs) here as well. No need for
rcu_stall_report_in_progress. You can update rsp->jiffies_stall
to reduce cache thrashing on ->stalllock.

> +
> /*
> * OK, time to rat on ourselves...
> * See Documentation/RCU/stallwarn.txt for info on how to debug
> @@ -813,6 +840,7 @@ static void print_cpu_stall(struct rcu_state *rsp)
> if (ULONG_CMP_GE(jiffies, rsp->jiffies_stall))
> rsp->jiffies_stall = jiffies +
> 3 * jiffies_till_stall_check() + 3;
> + rcu_stall_report_in_progress--;

And ->stalllock makes this unnecessary as well.

> raw_spin_unlock_irqrestore(&rnp->lock, flags);
>
> set_need_resched(); /* kick ourselves to get things going. */
>
> -Mike
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/