Re: RCU qsmask !=0 warnings on large-SMP...

From: Steffen Persvold
Date: Fri Jan 27 2012 - 06:09:33 EST


On 1/26/2012 20:26, Paul E. McKenney wrote:
On Thu, Jan 26, 2012 at 04:04:37PM +0100, Steffen Persvold wrote:
On 1/26/2012 02:58, Paul E. McKenney wrote:
On Wed, Jan 25, 2012 at 11:48:58PM +0100, Steffen Persvold wrote:
[]

This looks like it will produce useful information, but I am not seeing
output from it below.

Thanx, Paul

This run it was CPU24 that triggered the issue :


This line is the printout for the root level :

[ 231.572688] CPU 24, treason uncloaked, rsp @ ffffffff81a1cd80 (rcu_sched), rnp @ ffffffff81a1cd80(r) qsmask=0x1f, c=5132 g=5132 nc=5132 ng=5133 sc=5132 sg=5133 mc=5132 mg=5133

OK, so the rcu_state structure (sc and sg) believes that grace period
5133 has started but not completed, as expected. Strangely enough, so
does the root rcu_node structure (nc and ng) and the CPU's leaf rcu_node
structure (mc and mg).

The per-CPU rcu_data structure (c and g) does not yet know about the
new 5133 grace period, as expected.

So this is the code in kernel/rcutree.c:rcu_start_gp() that does the
initialization:

rcu_for_each_node_breadth_first(rsp, rnp) {
raw_spin_lock(&rnp->lock); /* irqs already disabled. */
rcu_preempt_check_blocked_tasks(rnp);
rnp->qsmask = rnp->qsmaskinit;
rnp->gpnum = rsp->gpnum;
rnp->completed = rsp->completed;
if (rnp == rdp->mynode)
rcu_start_gp_per_cpu(rsp, rnp, rdp);
rcu_preempt_boost_start_gp(rnp);
trace_rcu_grace_period_init(rsp->name, rnp->gpnum,
rnp->level, rnp->grplo,
rnp->grphi, rnp->qsmask);
raw_spin_unlock(&rnp->lock); /* irqs remain disabled. */
}

I am assuming that your debug prints are still invoked right after
the raw_spin_lock() above. If so, I would expect nc==ng and mc==mg.
Even if your debug prints followed the assignments to rnp->gpnum and
rnp->completed, I would expect mc==mg for the root and internal rcu_node
structures. But you say below that you get the same values throughout,
and in that case, I would expect the leaf rcu_node structure to show
something different than the root and internal structures.

The code really does hold the root rcu_node lock at all calls to
rcu_gp_start(), so I don't see how we could be getting two CPUs in that
code at the same time, which would be one way that the rcu_node and
rcu_data structures might get advance notice of the new grace period,
but in that case, you would have more than one bit set in ->qsmask.

So, any luck with the trace events for rcu_grace_period and
rcu_grace_period_init?


I've successfully enabled them and it seems to work, however once the issue is triggered any attempt to access /sys/kernel/debug/tracing/trace just hangs :/

Cheers,
--
Steffen Persvold, Chief Architect NumaChip
Numascale AS - www.numascale.com
Tel: +47 92 49 25 54 Skype: spersvold
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/