Re: 回复: [PATCH] rcu: shrink each possible cpu krcp

From: Joel Fernandes
Date: Wed Aug 19 2020 - 09:57:05 EST


On Wed, Aug 19, 2020 at 03:00:55AM +0000, Zhang, Qiang wrote:
>
>
> ________________________________________
> 发件人: linux-kernel-owner@xxxxxxxxxxxxxxx <linux-kernel-owner@xxxxxxxxxxxxxxx> 代表 Joel Fernandes <joel@xxxxxxxxxxxxxxxxx>
> 发送时间: 2020年8月19日 8:04
> 收件人: Paul E. McKenney
> 抄送: Uladzislau Rezki; Zhang, Qiang; Josh Triplett; Steven Rostedt; Mathieu Desnoyers; Lai Jiangshan; rcu; LKML
> 主题: Re: [PATCH] rcu: shrink each possible cpu krcp
>
> On Tue, Aug 18, 2020 at 6:02 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index b8ccd7b5af82..6decb9ad2421 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -2336,10 +2336,15 @@ int rcutree_dead_cpu(unsigned int cpu)
> > > {
> > > struct rcu_data *rdp = per_cpu_ptr(&rcu_data, cpu);
> > > struct rcu_node *rnp = rdp->mynode; /* Outgoing CPU's rdp & rnp. */
> > > + struct kfree_rcu_cpu *krcp;
> > >
> > > if (!IS_ENABLED(CONFIG_HOTPLUG_CPU))
> > > return 0;
> > >
> > > + /* Drain the kcrp of this CPU. IRQs should be disabled? */
> > > + krcp = this_cpu_ptr(&krc)
> > > + schedule_delayed_work(&krcp->monitor_work, 0);
> > > +
> > >
> > > A cpu can be offlined and its krp will be stuck until a shrinker is involved.
> > > Maybe be never.
> >
> > Does the same apply to its kmalloc() per-CPU caches? If so, I have a
> > hard time getting too worried about it. ;-)
>
> >Looking at slab_offline_cpu() , that calls cancel_delayed_work_sync()
> >on the cache reaper who's job is to flush the per-cpu caches. So I
> >believe during CPU offlining, the per-cpu slab caches are flushed.
> >
> >thanks,
> >
> >- Joel
>
> When cpu going offline, the slub or slab only flush free objects in offline
> cpu cache, put these free objects in node list or return buddy system,
> for those who are still in use, they still stay offline cpu cache.
>
> If we want clean per-cpu "krcp" objects when cpu going offline. we should
> free "krcp" cache objects in "rcutree_offline_cpu", this func be called
> before other rcu cpu offline func. and then "rcutree_offline_cpu" will be
> called in "cpuhp/%u" per-cpu thread.
>

Could you please wrap text properly when you post to mailing list, thanks. I
fixed it for you above.

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 8ce77d9ac716..1812d4a1ac1b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3959,6 +3959,7 @@ int rcutree_offline_cpu(unsigned int cpu)
> unsigned long flags;
> struct rcu_data *rdp;
> struct rcu_node *rnp;
> + struct kfree_rcu_cpu *krcp;
>
> rdp = per_cpu_ptr(&rcu_data, cpu);
> rnp = rdp->mynode;
> @@ -3970,6 +3971,11 @@ int rcutree_offline_cpu(unsigned int cpu)
>
> // nohz_full CPUs need the tick for stop-machine to work quickly
> tick_dep_set(TICK_DEP_BIT_RCU);
> +
> + krcp = per_cpu_ptr(&krc, cpu);
> + raw_spin_lock_irqsave(&krcp->lock, flags);
> + schedule_delayed_work(&krcp->monitor_work, 0);
> + raw_spin_unlock_irqrestore(&krcp->lock, flags);
> return 0;

I realized the above is not good enough for what this is trying to do. Unlike
the slab, the new kfree_rcu objects cannot always be drained / submitted to
RCU because the previous batch may still be waiting for a grace period. So
the above code could very well return with the yet-to-be-submitted kfree_rcu
objects still in the cache.

One option is to spin-wait here for monitor_todo to be false and keep calling
kfree_rcu_drain_unlock() till then.

But then that's not good enough either, because if new objects are queued
when interrupts are enabled in the CPU offline path, then the cache will get
new objects after the previous set was drained. Further, spin waiting may
introduce deadlocks.

Another option is to switch the kfree_rcu() path to non-batching (so new
objects cannot be cached in the offline path and are submitted directly to
RCU), wait for a GP and then submit the work. But then not sure if 1-argument
kfree_rcu() will like that.

Probably Qian's original fix for for_each_possible_cpus() is good enough for
the shrinker case, and then we can tackle the hotplug one.

thanks,

- Joel