Re: [PATCH rcu/urgent 0/6] Fixes for RCU/scheduler/irq-threads trainwreck

From: Ben Greear
Date: Wed Jul 20 2011 - 16:56:26 EST


On 07/20/2011 01:33 PM, Paul E. McKenney wrote:
On Wed, Jul 20, 2011 at 09:57:42PM +0200, Ingo Molnar wrote:

* Ingo Molnar<mingo@xxxxxxx> wrote:


* Paul E. McKenney<paulmck@xxxxxxxxxxxxxxxxxx> wrote:

If my guess is correct, then the minimal non-RCU_BOOST fix is #4
(which drags along #3) and #6. Which are not one-liners, but
somewhat smaller:

b/kernel/rcutree_plugin.h | 12 ++++++------
b/kernel/softirq.c | 12 ++++++++++--
kernel/rcutree_plugin.h | 31 +++++++++++++++++++++++++------
3 files changed, 41 insertions(+), 14 deletions(-)

That's half the patch size and half the patch count.

PeterZ's question is relevant: since we apparently had similar bugs
in v2.6.39 as well, what changed in v3.0 that makes them so urgent
to fix?

If it's just better instrumentation that proves them better then
i'd suggest fixing this in v3.1 and not risking v3.0 with an
unintended side effect.

Ok, i looked some more at the background and the symptoms that people
are seeing: kernel crashes and lockups. I think we want these
problems fixed in v3.0, even if it was the recent introduction of
RCU_BOOST that made it really prominent.

Having put some testing into your rcu/urgent branch today i'd feel
more comfortable with taking this plus perhaps an RCU_BOOST disabling
patch. That makes it all fundamentally tested by a number of people
(including those who reported/reproduced the problems).

RCU_BOOST is currently default=n. Is that sufficient? If not, one

Not if it remains broken I think..unless you put it under CONFIG_BROKEN
or something. Otherwise, folks are liable to turn it on and not realize
it's the cause of subtle bugs.

For what it's worth, my tests have been running clean for around 2 hours, so the full set of
fixes with RCU_BOOST appears good, so far. I'll let it continue to run
at least overnight to make sure I'm not just getting lucky...

Thanks,
Ben

--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/