Re: [PATCH] rcu: Benefit from expedited grace period in __wait_rcu_gp

From: Raslan, KarimAllah
Date: Tue Oct 23 2018 - 11:13:55 EST


On Fri, 2018-10-19 at 13:21 -0700, Paul E. McKenney wrote:
> On Fri, Oct 19, 2018 at 07:45:51PM +0000, Raslan, KarimAllah wrote:
> >
> > On Fri, 2018-10-19 at 05:31 -0700, Paul E. McKenney wrote:
> > >
> > > On Fri, Oct 19, 2018 at 02:49:05AM +0200, KarimAllah Ahmed wrote:
> > > >
> > > >
> > > > When expedited grace-period is set, both synchronize_sched
> > > > synchronize_rcu_bh can be optimized to have a significantly lower latency.
> > > >
> > > > Improve wait_rcu_gp handling to also account for expedited grace-period.
> > > > The downside is that wait_rcu_gp will not wait anymore for all RCU variants
> > > > concurrently when an expedited grace-period is set, however, given the
> > > > improved latency it does not really matter.
> > > >
> > > > Cc: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
> > > > Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
> > > > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> > > > Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
> > > > Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx>
> > > > Cc: linux-kernel@xxxxxxxxxxxxxxx
> > > > Signed-off-by: KarimAllah Ahmed <karahmed@xxxxxxxxx>
> > >
> > > Cute!
> > >
> > > Unfortunately, there are a few problems with this patch:
> > >
> > > 1. I will be eliminating synchronize_rcu_mult() due to the fact that
> > > the upcoming RCU flavor consolidation eliminates its sole caller.
> > > See 5fc9d4e000b1 ("rcu: Eliminate synchronize_rcu_mult()")
> > > in my -rcu tree. This would of course also eliminate the effects
> > > of this patch.
> >
> > Your patch covers our use-case already, but I still think that the semanticsÂ
> > forÂwait_rcu_gp is not clear to me.
> >
> > The problem for us was that sched_cpu_deactivate would call
> > synchronize_rcu_mult which does not check for "expedited" at all. So even
> > though we are already using rcu_expedited sysctl variable, synchronize_rcu_multÂ
> > was just ignoring it.
> >
> > That being said, I indeed overlooked rcu_normal and that it takes precedenceÂ
> > over expedited and I did not notice at all the deadlock you mentioned below!
> >
> > That can however be easily fixed by also checking for !rcu_gp_is_normal.
>
> ???
>
> The aforementioned 5fc9d4e000b1 commit replaces the synchronize_rcu_mult()
> with synchronize_rcu(), which really would be subject to the sysfs
> variables. Of course, this is not yet in mainline, so it perhaps cannot
> solve your immediate problem, which probably involve older kernels in
> any case. More on this below...
>
> >
> > >
> > > 2. The real-time guys' users are not going to be at all happy
> > > with the IPIs resulting from the _expedited() API members.
> > > Yes, they can boot with rcupdate.rcu_normal=1, but they don't
> > > always need that big a hammer, and use of this kernel parameter
> > > can slow down boot, hibernation, suspend, network configuration,
> > > and much else besides. We therefore don't want them to have to
> > > use rcupdate.rcu_normal=1 unless absolutely necessary.
> >
> > I might be missing something here. Why would they need to "explicitly" useÂ
> > rcu_normal? If rcu_expedited is set, would not the expected behavior is to callÂ
> > into the expedited version?
> >
> > My patch should only activate *expedited* if and only if it is set.
>
> You are right, I was confused. However...
>
> >
> > I think I might be misunderstanding the expected behaviorÂ
> > fromÂsynchronize_rcu_mult. My understanding is that something like:
> >
> > synchronize_rcu_mult(call_rcu_sched) and synchronize_rcu() should have anÂ
> > identical behavior, right?
>
> You would clearly prefer that it did, and the commit log does seem to
> read that way, but synchronize_rcu_mult() is going away anyway, so there
> isn't a whole lot of point in arguing about what it should have done.
> And the eventual implementation (with 5fc9d4e000b1 or its successor)
> will act as you want.
>
> >
> > At least in this commit:
> >
> > commit d7d34d5e46140 ("sched: Rely on synchronize_rcu_mult() de-duplication")
> >
> > .. the change clearly gives the impression that they can be usedÂ
> > interchangeably. The problem is that this is not true when you look at theÂ
> > implementation. One of them (i.e. synchronize_rcu) will respect the
> > expedite_rcu flag set by sysfs while the other (i.e. synchronize_rcu_mult)Â
> > simply ignores it.
> >
> > So my patch is about making sure that both of the variants actually respectÂ
> > it.
>
> I am guessing that you need to make an older version of the kernel
> expedite the CPU-hotplug grace periods. I am also guessing that you can
> carry patches to your kernels. If so, I suggest the following simpler
> change to sched_cpu_deactivate() in kernel/sched/core.c:
>
> if (rcu_gp_is_expedited()) {
> synchronize_sched_expedited();
> if (IS_ENABLED(CONFIG_PREEMPT))
> synchronize_rcu_expedited();
> } else {
> synchronize_rcu_mult(call_rcu, call_rcu_sched);
> }
>
> As soon as this patch conflicts due to the synchronize_rcu_mult()
> becoming synchronize_rcu(), you can drop the patch. And this is the only
> use of synchronize_rcu_mult(), so this approach loses no generality.
> Longer term, this patch might possibly be the backport of 5fc9d4e000b1
> back to v4.14, but at the end of the day this is up to the various
> -stable maintainers.
>
> Hmmm... If you are running CONFIG_PREEMPT=n, you can use an even
> simpler replacement for synchronize_rcu_mult():
>
> synchronize_sched_expedited(); /* Bug if CONFIG_PREEMPT=y!!! */
>
> Would either of those two approaches work for you, or am I still missing
> something?

Sorry, I was not clear. The original commit in your -rcu tree already works forÂ
us. So that is sorted out, thank you :)

In my previous reply, when I was referring to synchronize_rcu_mult I really
also wanted to also point to __wait_rcu_gp. With the current state in your -rcuÂ
tree, __wait_rcu_gp does not look whether "expedited" is enabled or not. This
is currently delegated to the callers (e.g. synchronize_rcu andÂ
synchronize_rcu_bh). So any new direct users of __wait_rcu_gp would also missÂ
this check (i.e. just like what happened in synchronize_rcu_mult). If we wantÂ
__wait_rcu_gp to always respect rcu_expedited and rcu_normal flags, we mightÂ
want to pull these checks into __wait_rcu_gp instead and remove them from theÂ
callers.


>
> Thanx, Paul
>
> >
> > >
> > > 3. If the real-time guys' users were to have booted with
> > > rcupdate.rcu_normal=1, then synchronize_sched_expedited()
> > > would invoke _synchronize_rcu_expedited, which would invoke
> > > wait_rcu_gp(), which would invoke _wait_rcu_gp() which would
> > > invoke __wait_rcu_gp(), which, given your patch, would in turn
> > > invoke synchronize_sched_expedited(). This situation could
> > > well prevent their systems from meeting their response-time
> > > requirements.
> > >
> > > So I cannot accept this patch nor for that matter any similar patch.
> > >
> > > But what were you really trying to get done here? If you were thinking
> > > of adding another synchronize_rcu_mult(), the flavor consolidation will
> > > make that unnecessary in most cases. If you are trying to speed up
> > > CPU-hotplug operations, I suggest using the rcu_expedited sysctl variable
> > > when taking a CPU offline. If something else, please let me know what
> > > it is so that we can work out how the problem might best be solved.
> > >
> > > Thanx, Paul
> > >
> > > >
> > > >
> > > > ---
> > > > kernel/rcu/update.c | 34 ++++++++++++++++++++++++++++------
> > > > 1 file changed, 28 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c
> > > > index 68fa19a..44b8817 100644
> > > > --- a/kernel/rcu/update.c
> > > > +++ b/kernel/rcu/update.c
> > > > @@ -392,13 +392,27 @@ void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array,
> > > > might_sleep();
> > > > continue;
> > > > }
> > > > - init_rcu_head_on_stack(&rs_array[i].head);
> > > > - init_completion(&rs_array[i].completion);
> > > > +
> > > > for (j = 0; j < i; j++)
> > > > if (crcu_array[j] == crcu_array[i])
> > > > break;
> > > > - if (j == i)
> > > > - (crcu_array[i])(&rs_array[i].head, wakeme_after_rcu);
> > > > + if (j != i)
> > > > + continue;
> > > > +
> > > > + if ((crcu_array[i] == call_rcu_sched ||
> > > > + crcu_array[i] == call_rcu_bh)
> > > > + && rcu_gp_is_expedited()) {
> > > > + if (crcu_array[i] == call_rcu_sched)
> > > > + synchronize_sched_expedited();
> > > > + else
> > > > + synchronize_rcu_bh_expedited();
> > > > +
> > > > + continue;
> > > > + }
> > > > +
> > > > + init_rcu_head_on_stack(&rs_array[i].head);
> > > > + init_completion(&rs_array[i].completion);
> > > > + (crcu_array[i])(&rs_array[i].head, wakeme_after_rcu);
> > > > }
> > > >
> > > > /* Wait for all callbacks to be invoked. */
> > > > @@ -407,11 +421,19 @@ void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array,
> > > > (crcu_array[i] == call_rcu ||
> > > > crcu_array[i] == call_rcu_bh))
> > > > continue;
> > > > +
> > > > + if ((crcu_array[i] == call_rcu_sched ||
> > > > + crcu_array[i] == call_rcu_bh)
> > > > + && rcu_gp_is_expedited())
> > > > + continue;
> > > > +
> > > > for (j = 0; j < i; j++)
> > > > if (crcu_array[j] == crcu_array[i])
> > > > break;
> > > > - if (j == i)
> > > > - wait_for_completion(&rs_array[i].completion);
> > > > + if (j != i)
> > > > + continue;
> > > > +
> > > > + wait_for_completion(&rs_array[i].completion);
> > > > destroy_rcu_head_on_stack(&rs_array[i].head);
> > > > }
> > > > }
> > > > --
> > > > 2.7.4
> > > >
> > >
> > Amazon Development Center Germany GmbH
> > Berlin - Dresden - Aachen
> > main office: Krausenstr. 38, 10117 Berlin
> > Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
> > Ust-ID: DE289237879
> > Eingetragen am Amtsgericht Charlottenburg HRB 149173 B
>
>
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B