Re: spinlock lockup on CPU#0

From: Justin Mattock
Date: Sat Apr 26 2008 - 15:32:48 EST


On Sat, Apr 26, 2008 at 7:14 PM, Ingo Molnar <mingo@xxxxxxx> wrote:
>
> * Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
>
> > > Hopefully the numbers are right, and hopefully this provides enough
> > > info to help the kernel out
> >
> > Well that's cute. At a guess I'd say that acpi_processor_idle()
> > managed to call sched_clock_idle_wakeup_event() with local interrupts
> > enabled. We took an interrupt with rq->lock held and things went
> > downhill from there.
> >
> > Can you add this please, see if it triggers?
>
> there's fixes pending in this area. The main fix would be the one below.
>
> Ingo
>
> ---------------->
> Subject: idle (arch, acpi and apm) and lockdep
> From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Date: Fri, 25 Apr 2008 17:39:01 +0200
>
> On Fri, 2008-04-25 at 16:59 +0200, Peter Zijlstra wrote:
> > On Fri, 2008-04-25 at 14:47 +0000, Justin Mattock wrote:
> > > On Fri, Apr 25, 2008 at 12:11 PM, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:
> > > > On Fri, 2008-04-25 at 00:24 +0000, Justin Mattock wrote:
> > > >
> > > > > [ 13.269763] =================================
> > > > > [ 13.270954] [ INFO: inconsistent lock state ]
> > > > > [ 13.271865] 2.6.25-04569-gb69d398 #3
> > > > > [ 13.272614] ---------------------------------
> > > > > [ 13.273521] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
> > > > > [ 13.274745] swapper/0 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > > > > [ 13.275787] (&rq->rq_lock_key){++..}, at: [<c01208e9>] sched_clock_idle_wakeup_event+0x43/0x74
> > > > > [ 13.276859] {in-hardirq-W} state was registered at:
> > > > > [ 13.276859] [<c01423d8>] __lock_acquire+0x419/0xb70
> > > > > [ 13.276859] [<c0142bae>] lock_acquire+0x7f/0xa6
> > > > > [ 13.280188] [<c0394132>] _spin_lock+0x1c/0x49
> > > > > [ 13.280188] [<c0121c44>] scheduler_tick+0x43/0x1bd
> > > > > [ 13.280188] [<c012d99f>] update_process_times+0x3d/0x49
> > > > > [ 13.283524] [<c013daf6>] tick_periodic+0x66/0x72
> > > > > [ 13.283524] [<c013db1b>] tick_handle_periodic+0x19/0x6a
> > > > > [ 13.283524] [<c0105a87>] timer_interrupt+0x47/0x4e
> > > > > [ 13.286855] [<c015cf64>] handle_IRQ_event+0x1a/0x4f
> > > > > [ 13.286855] [<c015e3e9>] handle_level_irq+0x7f/0xca
> > > > > [ 13.286855] [<c0105a27>] do_IRQ+0x71/0x8a
> > > > > [ 13.290190] [<c01043fe>] common_interrupt+0x2e/0x34
> > > > > [ 13.290190] [<c038bd6f>] calibrate_delay+0x8f/0x276
> > > > > [ 13.290190] [<c04f0883>] start_kernel+0x27c/0x2f8
> > > > > [ 13.293520] [<c04f0008>] __init_begin+0x8/0xa
> > > > > [ 13.293520] [<ffffffff>] 0xffffffff
> > > > > [ 13.293520] irq event stamp: 253965
> > > > > [ 13.296856] hardirqs last enabled at (253965): [<c0108069>] native_sched_clock+0xe7/0xff
> > > > > [ 13.296856] hardirqs last disabled at (253964): [<c0107fef>] native_sched_clock+0x6d/0xff
> > > > > [ 13.300190] softirqs last enabled at (253958): [<c0129cea>] __do_softirq+0xf9/0xff
> > > > > [ 13.300190] softirqs last disabled at (253953): [<c0129d3d>] do_softirq+0x4d/0x79
> > > > > [ 13.303522]
> > > > > [ 13.303522] other info that might help us debug this:
> > > > > [ 13.303522] no locks held by swapper/0.
> > > > > [ 13.303522]
> > > > > [ 13.303522] stack backtrace:
> > > > > [ 13.306852] Pid: 0, comm: swapper Not tainted 2.6.25-04569-gb69d398 #3
> > > > > [ 13.336851] [<c01409f9>] print_usage_bug+0x106/0x113
> > > > > [ 13.340185] [<c0141823>] mark_lock+0x1ed/0x3a5
> > > > > [ 13.343519] [<c014244d>] __lock_acquire+0x48e/0xb70
> > > > > [ 13.346852] [<c0142bae>] lock_acquire+0x7f/0xa6
> > > > > [ 13.350185] [<c01208e9>] ? sched_clock_idle_wakeup_event+0x43/0x74
> > > > > [ 13.353519] [<c0394132>] _spin_lock+0x1c/0x49
> > > > > [ 13.360185] [<c01208e9>] ? sched_clock_idle_wakeup_event+0x43/0x74
> > > > > [ 13.363518] [<c01208e9>] sched_clock_idle_wakeup_event+0x43/0x74
> >
> > Got it:
> >
> > acpi_idle_do_entry()
> > acpi_processor_ffh_cstate_enter()
> > mwait_idle_with_hints() (32 bit)
> > local_irq_enable()
> >
> > sched_clock_idle_wakeup_event()
> >
> >
> > I think my recent idle patches should address this, no?
> >
> > > > > [ 13.366851] [<f883919b>] acpi_idle_enter_bm+0x2b3/0x333 [processor]
> > > > > [ 13.370184] [<c02e5ec1>] cpuidle_idle_call+0x63/0x92
> > > > > [ 13.373517] [<c02e5e5e>] ? cpuidle_idle_call+0x0/0x92
> > > > > [ 13.380184] [<c010257d>] cpu_idle+0xb6/0xd6
> > > > > [ 13.383517] [<c0387071>] rest_init+0x49/0x4b
> > > > > [ 13.386850] =======================
>
> (I just found out I failed to copy LKML on the last discussion about
> these patches)
>
> Signed-off-by: Ingo Molnar <mingo@xxxxxxx>
> ---
> From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>
> OK, so 25-mm1 gave a lockdep error which made me look into this.
>
> The first thing that I noticed was the horrible mess; the second thing I
> saw was hacks like: 71e93d15612c61c2e26a169567becf088e71b8ff
>
> The problem is that arch idle routines are somewhat inconsitent with
> their IRQ state handling and instead of fixing _that_, we go paper over
> the problem.
>
> So the thing I've tried to do is set a standard for idle routines and
> fix them all up to adhere to that. So the rules are:
>
> idle routines are entered with IRQs disabled
> idle routines will exit with IRQs enabled
>
> Nearly all already did this in one form or another.
>
> Merge the 32 and 64 bit bits so they no longer have different bugs.
>
> As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated
> irq-enable.
>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
> ---
> arch/x86/kernel/apm_32.c | 3 +
> arch/x86/kernel/process.c | 117 +++++++++++++++++++++++++++++++++++++++
> arch/x86/kernel/process_32.c | 118 +---------------------------------------
> arch/x86/kernel/process_64.c | 123 +-----------------------------------------
> drivers/acpi/processor_idle.c | 19 +++---
> include/asm-x86/processor.h | 1
> 6 files changed, 137 insertions(+), 244 deletions(-)
>
> Index: linux/arch/x86/kernel/apm_32.c
> ===================================================================
> --- linux.orig/arch/x86/kernel/apm_32.c
> +++ linux/arch/x86/kernel/apm_32.c
> @@ -904,6 +904,7 @@ recalc:
> original_pm_idle();
> else
> default_idle();
> + local_irq_disable();
> jiffies_since_last_check = jiffies - last_jiffies;
> if (jiffies_since_last_check > idle_period)
> goto recalc;
> @@ -911,6 +912,8 @@ recalc:
>
> if (apm_idle_done)
> apm_do_busy();
> +
> + local_irq_enable();
> }
>
> /**
> Index: linux/arch/x86/kernel/process.c
> ===================================================================
> --- linux.orig/arch/x86/kernel/process.c
> +++ linux/arch/x86/kernel/process.c
> @@ -4,6 +4,8 @@
> #include <linux/smp.h>
> #include <linux/slab.h>
> #include <linux/sched.h>
> +#include <linux/module.h>
> +#include <linux/pm.h>
>
> struct kmem_cache *task_xstate_cachep;
>
> @@ -42,3 +44,118 @@ void arch_task_cache_init(void)
> __alignof__(union thread_xstate),
> SLAB_PANIC | SLAB_NOTRACK, NULL);
> }
> +
> +static void do_nothing(void *unused)
> +{
> +}
> +
> +/*
> + * cpu_idle_wait - Used to ensure that all the CPUs discard old value of
> + * pm_idle and update to new pm_idle value. Required while changing pm_idle
> + * handler on SMP systems.
> + *
> + * Caller must have changed pm_idle to the new value before the call. Old
> + * pm_idle value will not be used by any CPU after the return of this function.
> + */
> +void cpu_idle_wait(void)
> +{
> + smp_mb();
> + /* kick all the CPUs so that they exit out of pm_idle */
> + smp_call_function(do_nothing, NULL, 0, 1);
> +}
> +EXPORT_SYMBOL_GPL(cpu_idle_wait);
> +
> +/*
> + * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
> + * which can obviate IPI to trigger checking of need_resched.
> + * We execute MONITOR against need_resched and enter optimized wait state
> + * through MWAIT. Whenever someone changes need_resched, we would be woken
> + * up from MWAIT (without an IPI).
> + *
> + * New with Core Duo processors, MWAIT can take some hints based on CPU
> + * capability.
> + */
> +void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
> +{
> + if (!need_resched()) {
> + __monitor((void *)&current_thread_info()->flags, 0, 0);
> + smp_mb();
> + if (!need_resched())
> + __mwait(ax, cx);
> + }
> +}
> +
> +/* Default MONITOR/MWAIT with no hints, used for default C1 state */
> +static void mwait_idle(void)
> +{
> + if (!need_resched()) {
> + __monitor((void *)&current_thread_info()->flags, 0, 0);
> + smp_mb();
> + if (!need_resched())
> + __sti_mwait(0, 0);
> + else
> + local_irq_enable();
> + } else
> + local_irq_enable();
> +}
> +
> +
> +static int __cpuinit mwait_usable(const struct cpuinfo_x86 *c)
> +{
> + if (force_mwait)
> + return 1;
> + /* Any C1 states supported? */
> + return c->cpuid_level >= 5 && ((cpuid_edx(5) >> 4) & 0xf) > 0;
> +}
> +
> +/*
> + * On SMP it's slightly faster (but much more power-consuming!)
> + * to poll the ->work.need_resched flag instead of waiting for the
> + * cross-CPU IPI to arrive. Use this option with caution.
> + */
> +static void poll_idle(void)
> +{
> + local_irq_enable();
> + cpu_relax();
> +}
> +
> +void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
> +{
> + static int selected;
> +
> + if (selected)
> + return;
> +#ifdef CONFIG_X86_SMP
> + if (pm_idle == poll_idle && smp_num_siblings > 1) {
> + printk(KERN_WARNING "WARNING: polling idle and HT enabled,"
> + " performance may degrade.\n");
> + }
> +#endif
> + if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
> + /*
> + * Skip, if setup has overridden idle.
> + * One CPU supports mwait => All CPUs supports mwait
> + */
> + if (!pm_idle) {
> + printk(KERN_INFO "using mwait in idle threads.\n");
> + pm_idle = mwait_idle;
> + }
> + }
> + selected = 1;
> +}
> +
> +static int __init idle_setup(char *str)
> +{
> + if (!strcmp(str, "poll")) {
> + printk("using polling idle threads.\n");
> + pm_idle = poll_idle;
> + } else if (!strcmp(str, "mwait"))
> + force_mwait = 1;
> + else
> + return -1;
> +
> + boot_option_idle_override = 1;
> + return 0;
> +}
> +early_param("idle", idle_setup);
> +
> Index: linux/arch/x86/kernel/process_32.c
> ===================================================================
> --- linux.orig/arch/x86/kernel/process_32.c
> +++ linux/arch/x86/kernel/process_32.c
> @@ -111,12 +111,10 @@ void default_idle(void)
> */
> smp_mb();
>
> - local_irq_disable();
> - if (!need_resched()) {
> + if (!need_resched())
> safe_halt(); /* enables interrupts racelessly */
> - local_irq_disable();
> - }
> - local_irq_enable();
> + else
> + local_irq_enable();
> current_thread_info()->status |= TS_POLLING;
> } else {
> local_irq_enable();
> @@ -128,17 +126,6 @@ void default_idle(void)
> EXPORT_SYMBOL(default_idle);
> #endif
>
> -/*
> - * On SMP it's slightly faster (but much more power-consuming!)
> - * to poll the ->work.need_resched flag instead of waiting for the
> - * cross-CPU IPI to arrive. Use this option with caution.
> - */
> -static void poll_idle(void)
> -{
> - local_irq_enable();
> - cpu_relax();
> -}
> -
> #ifdef CONFIG_HOTPLUG_CPU
> #include <asm/nmi.h>
> /* We don't actually take CPU down, just spin without interrupts. */
> @@ -196,6 +183,7 @@ void cpu_idle(void)
> if (cpu_is_offline(cpu))
> play_dead();
>
> + local_irq_disable();
> __get_cpu_var(irq_stat).idle_timestamp = jiffies;
> /* Don't trace irqs off for idle */
> stop_critical_timings();
> @@ -209,104 +197,6 @@ void cpu_idle(void)
> }
> }
>
> -static void do_nothing(void *unused)
> -{
> -}
> -
> -/*
> - * cpu_idle_wait - Used to ensure that all the CPUs discard old value of
> - * pm_idle and update to new pm_idle value. Required while changing pm_idle
> - * handler on SMP systems.
> - *
> - * Caller must have changed pm_idle to the new value before the call. Old
> - * pm_idle value will not be used by any CPU after the return of this function.
> - */
> -void cpu_idle_wait(void)
> -{
> - smp_mb();
> - /* kick all the CPUs so that they exit out of pm_idle */
> - smp_call_function(do_nothing, NULL, 0, 1);
> -}
> -EXPORT_SYMBOL_GPL(cpu_idle_wait);
> -
> -/*
> - * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
> - * which can obviate IPI to trigger checking of need_resched.
> - * We execute MONITOR against need_resched and enter optimized wait state
> - * through MWAIT. Whenever someone changes need_resched, we would be woken
> - * up from MWAIT (without an IPI).
> - *
> - * New with Core Duo processors, MWAIT can take some hints based on CPU
> - * capability.
> - */
> -void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
> -{
> - if (!need_resched()) {
> - __monitor((void *)&current_thread_info()->flags, 0, 0);
> - smp_mb();
> - if (!need_resched())
> - __sti_mwait(ax, cx);
> - else
> - local_irq_enable();
> - } else
> - local_irq_enable();
> -}
> -
> -/* Default MONITOR/MWAIT with no hints, used for default C1 state */
> -static void mwait_idle(void)
> -{
> - local_irq_enable();
> - mwait_idle_with_hints(0, 0);
> -}
> -
> -static int __cpuinit mwait_usable(const struct cpuinfo_x86 *c)
> -{
> - if (force_mwait)
> - return 1;
> - /* Any C1 states supported? */
> - return c->cpuid_level >= 5 && ((cpuid_edx(5) >> 4) & 0xf) > 0;
> -}
> -
> -void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
> -{
> - static int selected;
> -
> - if (selected)
> - return;
> -#ifdef CONFIG_X86_SMP
> - if (pm_idle == poll_idle && smp_num_siblings > 1) {
> - printk(KERN_WARNING "WARNING: polling idle and HT enabled,"
> - " performance may degrade.\n");
> - }
> -#endif
> - if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
> - /*
> - * Skip, if setup has overridden idle.
> - * One CPU supports mwait => All CPUs supports mwait
> - */
> - if (!pm_idle) {
> - printk(KERN_INFO "using mwait in idle threads.\n");
> - pm_idle = mwait_idle;
> - }
> - }
> - selected = 1;
> -}
> -
> -static int __init idle_setup(char *str)
> -{
> - if (!strcmp(str, "poll")) {
> - printk("using polling idle threads.\n");
> - pm_idle = poll_idle;
> - } else if (!strcmp(str, "mwait"))
> - force_mwait = 1;
> - else
> - return -1;
> -
> - boot_option_idle_override = 1;
> - return 0;
> -}
> -early_param("idle", idle_setup);
> -
> void __show_regs(struct pt_regs *regs, int all)
> {
> unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L;
> Index: linux/arch/x86/kernel/process_64.c
> ===================================================================
> --- linux.orig/arch/x86/kernel/process_64.c
> +++ linux/arch/x86/kernel/process_64.c
> @@ -107,26 +107,13 @@ void default_idle(void)
> * test NEED_RESCHED:
> */
> smp_mb();
> - local_irq_disable();
> - if (!need_resched()) {
> + if (!need_resched())
> safe_halt(); /* enables interrupts racelessly */
> - local_irq_disable();
> - }
> - local_irq_enable();
> + else
> + local_irq_enable();
> current_thread_info()->status |= TS_POLLING;
> }
>
> -/*
> - * On SMP it's slightly faster (but much more power-consuming!)
> - * to poll the ->need_resched flag instead of waiting for the
> - * cross-CPU IPI to arrive. Use this option with caution.
> - */
> -static void poll_idle(void)
> -{
> - local_irq_enable();
> - cpu_relax();
> -}
> -
> #ifdef CONFIG_HOTPLUG_CPU
> DECLARE_PER_CPU(int, cpu_state);
>
> @@ -207,110 +194,6 @@ void cpu_idle(void)
> }
> }
>
> -static void do_nothing(void *unused)
> -{
> -}
> -
> -/*
> - * cpu_idle_wait - Used to ensure that all the CPUs discard old value of
> - * pm_idle and update to new pm_idle value. Required while changing pm_idle
> - * handler on SMP systems.
> - *
> - * Caller must have changed pm_idle to the new value before the call. Old
> - * pm_idle value will not be used by any CPU after the return of this function.
> - */
> -void cpu_idle_wait(void)
> -{
> - smp_mb();
> - /* kick all the CPUs so that they exit out of pm_idle */
> - smp_call_function(do_nothing, NULL, 0, 1);
> -}
> -EXPORT_SYMBOL_GPL(cpu_idle_wait);
> -
> -/*
> - * This uses new MONITOR/MWAIT instructions on P4 processors with PNI,
> - * which can obviate IPI to trigger checking of need_resched.
> - * We execute MONITOR against need_resched and enter optimized wait state
> - * through MWAIT. Whenever someone changes need_resched, we would be woken
> - * up from MWAIT (without an IPI).
> - *
> - * New with Core Duo processors, MWAIT can take some hints based on CPU
> - * capability.
> - */
> -void mwait_idle_with_hints(unsigned long ax, unsigned long cx)
> -{
> - if (!need_resched()) {
> - __monitor((void *)&current_thread_info()->flags, 0, 0);
> - smp_mb();
> - if (!need_resched())
> - __mwait(ax, cx);
> - }
> -}
> -
> -/* Default MONITOR/MWAIT with no hints, used for default C1 state */
> -static void mwait_idle(void)
> -{
> - if (!need_resched()) {
> - __monitor((void *)&current_thread_info()->flags, 0, 0);
> - smp_mb();
> - if (!need_resched())
> - __sti_mwait(0, 0);
> - else
> - local_irq_enable();
> - } else {
> - local_irq_enable();
> - }
> -}
> -
> -
> -static int __cpuinit mwait_usable(const struct cpuinfo_x86 *c)
> -{
> - if (force_mwait)
> - return 1;
> - /* Any C1 states supported? */
> - return c->cpuid_level >= 5 && ((cpuid_edx(5) >> 4) & 0xf) > 0;
> -}
> -
> -void __cpuinit select_idle_routine(const struct cpuinfo_x86 *c)
> -{
> - static int selected;
> -
> - if (selected)
> - return;
> -#ifdef CONFIG_X86_SMP
> - if (pm_idle == poll_idle && smp_num_siblings > 1) {
> - printk(KERN_WARNING "WARNING: polling idle and HT enabled,"
> - " performance may degrade.\n");
> - }
> -#endif
> - if (cpu_has(c, X86_FEATURE_MWAIT) && mwait_usable(c)) {
> - /*
> - * Skip, if setup has overridden idle.
> - * One CPU supports mwait => All CPUs supports mwait
> - */
> - if (!pm_idle) {
> - printk(KERN_INFO "using mwait in idle threads.\n");
> - pm_idle = mwait_idle;
> - }
> - }
> - selected = 1;
> -}
> -
> -static int __init idle_setup(char *str)
> -{
> - if (!strcmp(str, "poll")) {
> - printk("using polling idle threads.\n");
> - pm_idle = poll_idle;
> - } else if (!strcmp(str, "mwait"))
> - force_mwait = 1;
> - else
> - return -1;
> -
> - boot_option_idle_override = 1;
> - return 0;
> -}
> -early_param("idle", idle_setup);
> -
> /* Prints also some state that isn't saved in the pt_regs */
> void __show_regs(struct pt_regs * regs, int all)
> {
> Index: linux/drivers/acpi/processor_idle.c
> ===================================================================
> --- linux.orig/drivers/acpi/processor_idle.c
> +++ linux/drivers/acpi/processor_idle.c
> @@ -418,13 +418,12 @@ static void acpi_processor_idle(void)
>
> cx = pr->power.state;
> if (!cx || acpi_idle_suspend) {
> - if (pm_idle_save)
> - pm_idle_save();
> - else
> + if (pm_idle_save) {
> + pm_idle_save(); /* enables IRQs */
> + } else {
> acpi_safe_halt();
> -
> - if (irqs_disabled())
> local_irq_enable();
> + }
>
> return;
> }
> @@ -520,10 +519,12 @@ static void acpi_processor_idle(void)
> * Use the appropriate idle routine, the one that would
> * be used without acpi C-states.
> */
> - if (pm_idle_save)
> - pm_idle_save();
> - else
> + if (pm_idle_save) {
> + pm_idle_save(); /* enables IRQs */
> + } else {
> acpi_safe_halt();
> + local_irq_enable();
> + }
>
> /*
> * TBD: Can't get time duration while in C1, as resumes
> @@ -534,8 +535,6 @@ static void acpi_processor_idle(void)
> * skew otherwise.
> */
> sleep_ticks = 0xFFFFFFFF;
> - if (irqs_disabled())
> - local_irq_enable();
>
> break;
>
> Index: linux/include/asm-x86/processor.h
> ===================================================================
> --- linux.orig/include/asm-x86/processor.h
> +++ linux/include/asm-x86/processor.h
> @@ -725,6 +725,7 @@ static inline void __mwait(unsigned long
>
> static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
> {
> + trace_hardirqs_on();
> /* "mwait %eax, %ecx;" */
> asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
> :: "a" (eax), "c" (ecx));
>
O.K.
I added WARN_ON(!irqs_disabled()); in kernel/sched.c
I'm not experiencing anything different, should I have done this with
a fresh git that was'nt patched with the above?
The numbers were different for me they were at kernel/sched.c line 1123 and 1143
regards;

--
Justin P. Mattock
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/