Re: [PATCH] tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem

From: Zhouyi Zhou
Date: Thu Jan 19 2023 - 18:53:22 EST


On Fri, Jan 20, 2023 at 4:45 AM Joel Fernandes (Google)
<joel@xxxxxxxxxxxxxxxxx> wrote:
>
> For CONFIG_NO_HZ_FULL systems, the tick_do_timer_cpu cannot be offlined.
> However, cpu_is_hotpluggable() still returns true for those CPUs. This causes
> torture tests that do offlining to end up trying to offline this CPU causing
> test failures. Such failure happens on all architectures.
>
> Fix it by asking the opinion of the nohz subsystem on whether the CPU can
> be hotplugged.
>
> [ Apply Frederic Weisbecker feedback on refactoring tick_nohz_cpu_down(). ]
Thanks for your fantastic work
I applied this fix to linux-5.15.y, and perform new round of rcu
torture test on PPC VM of Open Source Lab of Oregon State University.
Could you please wait for the test to finish?

The test results of linux-5.15.y before your patch can be viewed at [1]
The patched source code of linux-5.15.y can be viewed at [2]
The ongoing test of patched linux-5.15.y can be viewed at [3]

[1] http://140.211.169.189/linux-stable-rc/tools/testing/selftests/rcutorture/res/2023.01.18-13.22.39-torture/
[2] http://140.211.169.189/linux-stable-rc/
[3] http://140.211.169.189/linux-stable-rc/tools/testing/selftests/rcutorture/res/2023.01.19-23.40.55-torture/

Hope to continue to benefit the community.

Thank you all
Zhouyi
>
> Cc: Frederic Weisbecker <frederic@xxxxxxxxxx>
> Cc: "Paul E. McKenney" <paulmck@xxxxxxxxxx>
> Cc: Zhouyi Zhou <zhouzhouyi@xxxxxxxxx>
> Cc: Will Deacon <will@xxxxxxxxxx>
> Cc: Marc Zyngier <maz@xxxxxxxxxx>
> Cc: rcu <rcu@xxxxxxxxxxxxxxx>
> Fixes: 2987557f52b9 ("driver-core/cpu: Expose hotpluggability to the rest of the kernel")
> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> ---
> drivers/base/cpu.c | 3 ++-
> include/linux/tick.h | 2 ++
> kernel/time/tick-sched.c | 12 +++++++++++-
> 3 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index 55405ebf23ab..450dca235a2f 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -487,7 +487,8 @@ static const struct attribute_group *cpu_root_attr_groups[] = {
> bool cpu_is_hotpluggable(unsigned int cpu)
> {
> struct device *dev = get_cpu_device(cpu);
> - return dev && container_of(dev, struct cpu, dev)->hotpluggable;
> + return dev && container_of(dev, struct cpu, dev)->hotpluggable
> + && tick_nohz_cpu_hotpluggable(cpu);
> }
> EXPORT_SYMBOL_GPL(cpu_is_hotpluggable);
>
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index bfd571f18cfd..9459fef5b857 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -216,6 +216,7 @@ extern void tick_nohz_dep_set_signal(struct task_struct *tsk,
> enum tick_dep_bits bit);
> extern void tick_nohz_dep_clear_signal(struct signal_struct *signal,
> enum tick_dep_bits bit);
> +extern bool tick_nohz_cpu_hotpluggable(unsigned int cpu);
>
> /*
> * The below are tick_nohz_[set,clear]_dep() wrappers that optimize off-cases
> @@ -280,6 +281,7 @@ static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
>
> static inline void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit) { }
> static inline void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit) { }
> +static inline bool tick_nohz_cpu_hotpluggable(unsigned int cpu) { return true; }
>
> static inline void tick_dep_set(enum tick_dep_bits bit) { }
> static inline void tick_dep_clear(enum tick_dep_bits bit) { }
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 9c6f661fb436..383a060f30c5 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -510,7 +510,7 @@ void __init tick_nohz_full_setup(cpumask_var_t cpumask)
> tick_nohz_full_running = true;
> }
>
> -static int tick_nohz_cpu_down(unsigned int cpu)
> +static int tick_nohz_cpu_hotplug_ret(unsigned int cpu)
> {
> /*
> * The tick_do_timer_cpu CPU handles housekeeping duty (unbound
> @@ -522,6 +522,16 @@ static int tick_nohz_cpu_down(unsigned int cpu)
> return 0;
> }
>
> +static int tick_nohz_cpu_down(unsigned int cpu)
> +{
> + return tick_nohz_cpu_hotplug_ret(cpu);
> +}
> +
> +bool tick_nohz_cpu_hotpluggable(unsigned int cpu)
> +{
> + return tick_nohz_cpu_hotplug_ret(cpu) == 0;
> +}
> +
> void __init tick_nohz_init(void)
> {
> int cpu, ret;
> --
> 2.39.0.246.g2a6d74b583-goog
>