Re: 2.6.29-rc1 does not boot

From: Ingo Molnar
Date: Mon Jan 12 2009 - 14:00:23 EST

Next message: Christoph Lameter: "Re: regarding the x86_64 zero-based percpu patches"
Previous message: Steve French: "Re: linux-next: Tree for January 12 (cifs vs. staging)"
In reply to: Dieter Ries: "Re: 2.6.29-rc1 does not boot"
Next in thread: Michal Jaegermann: "Re: 2.6.29-rc1 does not boot"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

* Dieter Ries <clip2@xxxxxx> wrote:

> Hi,
>
> Ingo Molnar schrieb:
> >
> > So you did get stacktraces? You might want to boot with
> > CONFIG_PROVE_LOCKING=y, that could also catch and report the lockup
> > scenario.
>
> This is what I have got:
>
> [ 12.340122] =============================================
> [ 12.341044] [ INFO: possible recursive locking detected ]
> [ 12.341044] 2.6.29-rc1-00041-gacd1e11 #143
> [ 12.341044] ---------------------------------------------
> [ 12.341044] events/0/9 is trying to acquire lock:
> [ 12.341044] (events){--..}, at: [<ffffffff80254783>]
> flush_work+0x33/0x100
> [ 12.341044]
>
> [ 12.341044] but task is already holding lock:
>
> [ 12.341044] (events){--..}, at: [<ffffffff80253ef7>]
> run_workqueue+0x107/0x230
>
> [ 12.341044]
>
> [ 12.341044] other info that might help us debug this:
>
> [ 12.341044] 3 locks held by events/0/9:
>
> [ 12.341044] #0: (events){--..}, at: [<ffffffff80253ef7>]
> run_workqueue+0x107/0x230
>
> [ 12.341044] #1: ((dbs_work).work){--..}, at: [<ffffffff80253ef7>]
> run_workqueue+0x107/0x230
>
> [ 12.341044] #2: (dbs_mutex){--..}, at: [<ffffffff805692f5>]
> do_dbs_timer+0x25/0x250
>
> [ 12.341044]
>
> [ 12.341044] stack backtrace:
>
> [ 12.341044] Pid: 9, comm: events/0 Not tainted
> 2.6.29-rc1-00041-gacd1e11 #143
> [ 12.341044] Call Trace:
>
> [ 12.341044] [<ffffffff80269e69>] validate_chain+0xb69/0x1200
>
> [ 12.341044] [<ffffffff8026a93e>] __lock_acquire+0x43e/0xa50
>
> [ 12.341044] [<ffffffff8026afa8>] lock_acquire+0x58/0x80
>
> [ 12.341044] [<ffffffff80254783>] ? flush_work+0x33/0x100
>
> [ 12.341044] [<ffffffff802547a8>] flush_work+0x58/0x100
>
> [ 12.341044] [<ffffffff80254783>] ? flush_work+0x33/0x100
>
> [ 12.341044] [<ffffffff80268b7d>] ? trace_hardirqs_on+0xd/0x10
>
> [ 12.341044] [<ffffffff80254a3c>] ? __queue_work+0x3c/0x50
>
> [ 12.341044] [<ffffffff80254ad4>] ? queue_work_on+0x44/0x60
>
> [ 12.341044] [<ffffffff8021bd20>] ? do_drv_write+0x0/0x60
>
> [ 12.341044] [<ffffffff8021bd20>] ? do_drv_write+0x0/0x60
>
> [ 12.341044] [<ffffffff80254ba3>] work_on_cpu+0x93/0xc0
>
> [ 12.341044] [<ffffffff80253cc0>] ? do_work_for_cpu+0x0/0x20
>
> [ 12.341044] [<ffffffff8021bd20>] ? do_drv_write+0x0/0x60
>
> [ 12.341044] [<ffffffff8025cba1>] ?
> srcu_notifier_call_chain+0x11/0x20
> [ 12.341044] [<ffffffff8021c5c9>] acpi_cpufreq_target+0x239/0x350
>
> [ 12.341044] [<ffffffff80268af2>] ?
> trace_hardirqs_on_caller+0x112/0x190
> [ 12.341044] [<ffffffff80699ac8>] ? mutex_lock_nested+0x228/0x2f0
>
> [ 12.341044] [<ffffffff80565731>] __cpufreq_driver_target+0x81/0x90
>
> [ 12.341044] [<ffffffff8056940e>] do_dbs_timer+0x13e/0x250
>
> [ 12.341044] [<ffffffff805692d0>] ? do_dbs_timer+0x0/0x250
>
> [ 12.341044] [<ffffffff80253f49>] run_workqueue+0x159/0x230
>
> [ 12.341044] [<ffffffff80253ef7>] ? run_workqueue+0x107/0x230
>
> [ 12.341044] [<ffffffff80254d1f>] worker_thread+0xbf/0x120
>
> [ 12.341044] [<ffffffff80258550>] ? autoremove_wake_function+0x0/0x40
>
> [ 12.341044] [<ffffffff80254c60>] ? worker_thread+0x0/0x120
>
> [ 12.341044] [<ffffffff8025811d>] kthread+0x4d/0x80
>
> [ 12.341044] [<ffffffff8020c87a>] child_rip+0xa/0x20
>
> [ 12.341044] [<ffffffff8020c27c>] ? restore_args+0x0/0x30
> [ 12.341044] [<ffffffff802580d0>] ? kthread+0x0/0x80
> [ 12.341044] [<ffffffff8020c870>] ? child_rip+0x0/0x20
>
>
> complete log and config are attached.
>
> The different git hash is because I reverted that revert patch. It is
> exactly the same like the kernel on which I first found the problem,
> only with some debugging enabled now. Maybe I should make more use of
> gits capabilities...
>
> Hope that helps,

thanks, it helps!

the problem isnt even the hotplug lock, but that work_on_cpu() uses the
normal generic kevent workqueue - which workqueue can already contain
items related to the cpufreq code hence it's not safe to call it with any
cpufreq mutex held.

so Mike, i think we could solve this by work_on_cpu() getting its own
workqueue?

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Christoph Lameter: "Re: regarding the x86_64 zero-based percpu patches"
Previous message: Steve French: "Re: linux-next: Tree for January 12 (cifs vs. staging)"
In reply to: Dieter Ries: "Re: 2.6.29-rc1 does not boot"
Next in thread: Michal Jaegermann: "Re: 2.6.29-rc1 does not boot"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]