Re: NOHZ: WARNING: at arch/x86/kernel/smp.c:123 native_smp_send_reschedule,round 2

From: Viresh Kumar
Date: Mon May 20 2013 - 09:43:19 EST


On 20 May 2013 18:53, Borislav Petkov <bp@xxxxxxxxx> wrote:
> I just confirmed that policy->cpus contains offlined cores with this:
>
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index 5af40ad82d23..e8c25f71e9b6 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -169,6 +169,9 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
> {
> struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
>
> + if (WARN_ON(!cpu_online(cpu)))
> + return;
> +
> mod_delayed_work_on(cpu, system_wq, &cdbs->work, delay);
> }

Hmm, so for sure there is some locking issue there.
Have you tried my patch? I am not sure if it will fix everything but may
fix it.

> see splats collection below.
>
> And I don't think your fix above addresses the issue for the simple
> reason that if cpus go offline *before* you do get_online_cpus(), then
> policy->cpus will already contain offlined cpus.
>
> Rather, a better fix would be, IMHO, to do this (it works here, of course):
>
> ---
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index 5af40ad82d23..58541b164494 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -17,6 +17,7 @@
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> #include <asm/cputime.h>
> +#include <linux/cpu.h>
> #include <linux/cpufreq.h>
> #include <linux/cpumask.h>
> #include <linux/export.h>
> @@ -169,7 +170,15 @@ static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
> {
> struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
>
> + get_online_cpus();
> +
> + if (!cpu_online(cpu))
> + goto out;
> +
> mod_delayed_work_on(cpu, system_wq, &cdbs->work, delay);
> +
> + out:
> + put_online_cpus();
> }
>
> void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,

This looks fine, but I want to fix the locking rather than just
hiding the issue. :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/