Re: [tip:core/rcu] Revert "rcu: Decrease memory-barrier usagebased on semi-formal proof"

From: Paul E. McKenney
Date: Tue May 24 2011 - 20:05:39 EST


On Tue, May 24, 2011 at 02:23:45PM -0700, Yinghai Lu wrote:
> On 05/23/2011 06:35 PM, Paul E. McKenney wrote:
> > On Mon, May 23, 2011 at 06:26:23PM -0700, Yinghai Lu wrote:
> >> On 05/23/2011 06:18 PM, Paul E. McKenney wrote:
> >>
> >>> OK, so it looks like I need to get this out of the way in order to track
> >>> down the delays. Or does reverting PeterZ's patch get you a stable
> >>> system, but with the longish delays in memory_dev_init()? If the latter,
> >>> it might be more productive to handle the two problems separately.
> >>>
> >>> For whatever it is worth, I do see about 5% increase in grace-period
> >>> duration when switching to kthreads. This is acceptable -- your
> >>> 30x increase clearly is completely unacceptable and must be fixed.
> >>> Other than that, the main thing that affects grace period duration is
> >>> the setting of CONFIG_HZ -- the smaller the HZ value, the longer the
> >>> grace-period duration.
> >>
> >> for my 1024g system when memory hotadd is enabled in kernel config:
> >> 1. current linus tree + tip tree: memory_dev_init will take about 100s.
> >> 2. current linus tree + tip tree + your tree - Peterz patch:
> >> a. on fedora 14 gcc: will cost about 4s: like old times
> >> b. on opensuse 11.3 gcc: will cost about 10s.
> >
> > So some patch in my tree that is not yet in tip makes things better?
> >
> > If so, could you please see which one? Maybe that would give me a hint
> > that could make things better on opensuse 11.3 as well.
>
> today's tip:
>
> [ 31.795597] cpu_dev_init done
> [ 40.930202] memory_dev_init done

One other question... What is memory_dev_init() doing to wait for so
many RCU grace periods? (Yes, I do need to fix the slowdowns in any
case, but I am curious.)

> after
>
> commit e219b351fc90c0f5304e16efbc603b3b78843ea1
> Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> Date: Mon May 16 02:44:06 2011 -0700
>
> rcu: Remove old memory barriers from rcu_process_callbacks()
>
> Second step of partitioning of commit e59fb3120b.
>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
>
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 3731141..011bf6f 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -1460,25 +1460,11 @@ __rcu_process_callbacks(struct rcu_state *rsp, struct rcu_data *rdp)
> */
> static void rcu_process_callbacks(void)
> {
> - /*
> - * Memory references from any prior RCU read-side critical sections
> - * executed by the interrupted code must be seen before any RCU
> - * grace-period manipulations below.
> - */
> - smp_mb(); /* See above block comment. */
> -
> __rcu_process_callbacks(&rcu_sched_state,
> &__get_cpu_var(rcu_sched_data));
> __rcu_process_callbacks(&rcu_bh_state, &__get_cpu_var(rcu_bh_data));
> rcu_preempt_process_callbacks();
>
> - /*
> - * Memory references from any later RCU read-side critical sections
> - * executed by the interrupted code must be seen after any RCU
> - * grace-period manipulations above.
> - */
> - smp_mb(); /* See above block comment. */
> -
> /* If we are last CPU on way to dyntick-idle mode, accelerate it. */
> rcu_needs_cpu_flush();
> }
>
> cause
>
> [ 32.235103] cpu_dev_init done
> [ 74.897943] memory_dev_init done
>
> then add
>
> commit d0d642680d4cf5cc2ccf542b74a3c8b7e197306b
> Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> Date: Mon May 16 02:52:04 2011 -0700
>
> rcu: Don't do reschedule unless in irq
>
> Condition the set_need_resched() in rcu_irq_exit() on in_irq(). This
> should be a no-op, because rcu_irq_exit() should only be called from irq.
>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
>
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 011bf6f..195b3a3 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -421,8 +421,9 @@ void rcu_irq_exit(void)
> WARN_ON_ONCE(rdtp->dynticks & 0x1);
>
> /* If the interrupt queued a callback, get out of dyntick mode. */
> - if (__this_cpu_read(rcu_sched_data.nxtlist) ||
> - __this_cpu_read(rcu_bh_data.nxtlist))
> + if (in_irq() &&
> + (__this_cpu_read(rcu_sched_data.nxtlist) ||
> + __this_cpu_read(rcu_bh_data.nxtlist)))
> set_need_resched();
> }
>
> got:
>
> [ 34.384490] cpu_dev_init done
> [ 86.656322] memory_dev_init done
>
>
> after
>
> commit fcfc28801f5b3b9c70616fc57e3a2c6f52014e14
> Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> Date: Mon May 16 14:27:31 2011 -0700
>
> rcu: Make rcu_enter_nohz() pay attention to nesting
>
> The old version of rcu_enter_nohz() forced RCU into nohz mode even if
> the nesting count was non-zero. This change causes rcu_enter_nohz()
> to hold off for non-zero nesting counts.
>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
>
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 195b3a3..99c6038 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -324,8 +324,8 @@ void rcu_enter_nohz(void)
> smp_mb(); /* CPUs seeing ++ must see prior RCU read-side crit sects */
> local_irq_save(flags);
> rdtp = &__get_cpu_var(rcu_dynticks);
> - rdtp->dynticks++;
> - rdtp->dynticks_nesting--;
> + if (--rdtp->dynticks_nesting == 0)
> + rdtp->dynticks++;
> WARN_ON_ONCE(rdtp->dynticks & 0x1);
> local_irq_restore(flags);
> }
>
> got:
>
> [ 32.414049] cpu_dev_init done
> [ 38.237979] memory_dev_init done

So this is best for you -- where we have done all but the last commit
of restoring "Decrease memory-barrier usage based on semi-formal proof".
It makes sense that this one would help, as it is eliminating delays
due to misnesting. These delays are not hangs, as force_quiescent_state()
will eventually force the right thing to happen, but getting rid of these
delays should indeed speed things up.

> after:
> commit bcd6e68330f893a81b3519ab3c5fc2bebbc9988c
> Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> Date: Tue Sep 7 10:38:22 2010 -0700
>
> rcu: Decrease memory-barrier usage based on semi-formal proof
> ...
>
> got:
>
> [ 32.447936] cpu_dev_init done
> [ 111.027066] memory_dev_init done

So there is something nasty in this patch.

Not seeing it immediately, but it does give me some focus for both
code inspection and possible diagnostic patches.

> after
>
> commit fbb753fb9dd62318d27fa070c686423ced139817
> Author: Paul E. McKenney <paul.mckenney@xxxxxxxxxx>
> Date: Wed May 11 05:33:33 2011 -0700
>
> atomic: Add atomic_or()
>
> An atomic_or() function is needed by TREE_RCU to avoid deadlock, so
> add a generic version.
>
> Signed-off-by: Paul E. McKenney <paul.mckenney@xxxxxxxxxx>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
>
> diff --git a/include/linux/atomic.h b/include/linux/atomic.h
> index 96c038e..ee456c7 100644
> --- a/include/linux/atomic.h
> +++ b/include/linux/atomic.h
> @@ -34,4 +34,17 @@ static inline int atomic_inc_not_zero_hint(atomic_t *v, int hint)
> }
> #endif
>
> +#ifndef CONFIG_ARCH_HAS_ATOMIC_OR
> +static inline void atomic_or(int i, atomic_t *v)
> +{
> + int old;
> + int new;
> +
> + do {
> + old = atomic_read(v);
> + new = old | i;
> + } while (atomic_cmpxchg(v, old, new) != old);
> +}
> +#endif /* #ifndef CONFIG_ARCH_HAS_ATOMIC_OR */
> +
> #endif /* _LINUX_ATOMIC_H */
>
> got:
>
> [ 32.803704] cpu_dev_init done
> [ 99.171292] memory_dev_init done

So the difference between these two is noise, I hope. Adding a static
inline function that is not used should not have an effect on performance.
Still, the difference between 6 seconds and 60 seconds rises far above
this noise level, so the big differences are likely quite real.

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/