Re: [PATCH diagnostic] Re: HPET regression in 2.6.26 versus 2.6.25-- RCU problem

From: Paul E. McKenney
Date: Mon Aug 11 2008 - 09:17:38 EST


On Mon, Aug 11, 2008 at 01:38:17PM +0200, Ingo Molnar wrote:
>
> * Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx> wrote:
>
> > And here is the patch. It is still a bit raw, so the results should
> > be viewed with some suspicion. It adds a default-off kernel parameter
> > CONFIG_RCU_CPU_STALL which must be enabled.
> >
> > Rather than exponential backoff, it backs off to once per 30 seconds.
> > My feeling upon thinking on it was that if you have stalled RCU grace
> > periods for that long, a few extra printk() messages are probably the
> > least of your worries...
>
> while this wont debug problems were timer irqs are genuinely stuck for
> long periods of time, it should find problems with RCU completion logic
> itself in the presence of correct timer irqs - and the lack of any
> messages from this debug option should point the finger more firmly in
> the direction of stalled timer irqs.
>
> So i find this debug feature rather useful and have applied it to
> tip/core/rcu (and cleaned it up a bit). I renamed the config option to
> CONFIG_DEBUG_RCU_STALL to make it more in line with usual debug option
> names. Lets see whether -tip testing finds any false positives.

Sounds good!

For whatever it is worth, this diagnostic can also locate latency issues
in non-CONFIG_PREEMPT kernels, even when those problems are outside of
preempt_disable() regions. Latency tracer is of course a better tool
for things -inside- of preempt_disable() regions.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/