Re: [PATCH] rcu_sync: simplify the state machine, introduce __rcu_sync_enter()

From: Paul E. McKenney
Date: Thu Jul 21 2016 - 23:26:26 EST


On Thu, Jul 21, 2016 at 07:34:36PM +0200, Oleg Nesterov wrote:
> On 07/20, Paul E. McKenney wrote:
> >
> > On Wed, Jul 20, 2016 at 07:16:03PM +0200, Oleg Nesterov wrote:
> >
> > > Now, suppose we add the additional enter/exit's:
> > >
> > > freeze_super(sb)
> > > {
> > > // this doesn't block
> > > __rcu_sync_enter(SEM3);
> > > __rcu_sync_enter(SEM2);
> > > __rcu_sync_enter(SEM1);
> > >
> > > down_write(&sb->s_umount);
> > > if (NEED_TO_FREEZE) {
> > > percpu_down_write(SEM1);
> >
> > The above waits for the grace period initiated by __rcu_sync_enter(),
> > correct? Presumably "yes", because it will invoke rcu_sync_enter(), which
> > will see the state as GP_ENTER, and will thus wait.
>
> But if down_write() blocks and/or NEED_TO_FREEZE takes some time it
> could already see the GP_PASSED state, or at least it can sleep less.
>
> > But your point is that if !NEED_TO_FREEZE, we will get here without
> > waiting for a grace period.
> >
> > But why aren't the __rcu_sync_enter() and rcu_sync_exit() calls inside
> > the "if" statement?
>
> Yes, if we do __rcu_sync_enter() inside "if", then rcu_sync_exit() can't
> hit GP_ENTER.
>
> But why we should disallow this use-case? It does not complicate the code
> at all.

I do agree that it doesn't complicate the current implementation.
But it relies on a global lock, so I am not at all confident that this
implementation is the final word. I therefore tend to try to avoid
supporting more than is required.

And speaking of global locks, failing to discourage the pattern above
means that the code is unnecessarily acquiring three global locks,
which doesn't seem like a good thing to me.

> And see above, we want to initiate the GP "asap", so that we will sleep
> less later. Although yes, freeze_super() is not the best example. And
> __cgroup_procs_write() too, but note that cgroup_kn_lock_live() is rather
> heavy, takes the global locks, and can fail. So (ignoring the fact we
> are going to switch cgroup_threadgroup_rwsem into the slow mode for now)
> __rcu_sync_enter() at the start could help to lessen the time
> percpu_down_write(cgroup_threadgroup_rwsem) sleeps with the cgroup_mutex
> held.

I agree that there are use cases for beginning-of-time __rcu_sync_enter()
or whatever we end up naming it.

> > That aside, would it make sense to name __rcu_sync_enter() something
> > like rcu_sync_begin_to_enter(), rcu_sync_pre_enter() or some such?
> > Something to make it clear that it just starts the job and that something
> > else is needed to finish it.
>
> Sure. Agreed, will rename.

Thank you!

> > And here is an updated state table. I do not yet separately call out
> > __rcu_sync_enter(), though without it the rcu_sync_exit() transition
> > out of state B cannot happen.
>
> Thanks! I'll try to double-check it.

And thank you again!

Thanx, Paul