Re: soft lockup - CPU#1 stuck for 15s! [swapper:0]

From: Parag Warudkar
Date: Mon Dec 17 2007 - 08:08:39 EST


On Dec 17, 2007 3:05 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Sun, 16 Dec 2007, Parag Warudkar wrote:
>
> > On Dec 16, 2007 12:15 AM, Parag Warudkar <parag.warudkar@xxxxxxxxx> wrote:
> > > On Sat, 15 Dec 2007, Parag Warudkar wrote:
> > >
> > > >> I will run it for a little longer just to be sure - but I don't think it
> > > >> will be a problem.
> > >
> > > No problems for last 10 hours - I consider this fixed.
> > >
> >
> > Arghh - spoke 8 hours too soon. I left it running overnight and
> > morning I see a bunch of softlockups - so NO NOT FIXED.
> >
> > Parag
> >
> > BUG: soft lockup - CPU#1 stuck for 13s! [swapper:0]
> >
> > Pid: 0, comm: swapper Not tainted (2.6.24-rc5 #21)
> > EIP: 0060:[<c0533ca6>] EFLAGS: 00000206 CPU: 1
> > EIP is at acpi_idle_enter_simple+0x166/0x1d0
> > EAX: f7829f88 EBX: 00000dab ECX: 00000266 EDX: 00000000
> > ESI: 00000000 EDI: 00c056e5 EBP: 00c0493a ESP: f7829f88
> > DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> > CR0: 8005003b CR2: 080f6c78 CR3: 00718000 CR4: 000006d0
> > DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> > DR6: ffff0ff0 DR7: 00000400
>
> Sigh. You did not have the debug patch applied anymore, which printks
> the timer_list data ? Can you apply it again and provide the output
> please ?
>

This keeps getting more and more weird - This time I was running with
CONFIG_CPU_IDLE=N and I have ton of soft lockups after 14hr uptime.

The timer_list data for a few lockups is below. If you need the full
thing grab it from http://warudkars.net/messages.bz2 .

BUG: soft lockup - CPU#1 stuck for 12s! [kjournald:424]

Pid: 424, comm: kjournald Not tainted (2.6.24-rc5 #24)
EIP: 0060:[<c04bbd57>] EFLAGS: 00000206 CPU: 1
EIP is at __journal_remove_journal_head+0x18/0xe8
EAX: dc75e1c0 EBX: dc75e1c0 ECX: f6d539f0 EDX: dc75e1f8
ESI: dc75e1f8 EDI: 00000009 EBP: f712a000 ESP: f7157f28
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: b7d18eb4 CR3: 37352000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
[<c04b8680>] __journal_unfile_buffer+0x8/0x11
[<c04bbe64>] journal_put_journal_head+0x3d/0x4b
[<c04ba254>] journal_commit_transaction+0x75d/0xbc9
[<c0428246>] lock_timer_base+0x19/0x35
[<c04bcda0>] kjournald+0xa2/0x1d0
[<c0430e29>] autoremove_wake_function+0x0/0x35
[<c04bccfe>] kjournald+0x0/0x1d0
[<c0430d63>] kthread+0x38/0x5d
[<c0430d2b>] kthread+0x0/0x5d
[<c0404a0f>] kernel_thread_helper+0x7/0x10
=======================
Timer List Version: v0.3
HRTIMER_MAX_CLOCK_BASES: 2
now at 39080147128907 nsecs

cpu: 0
clock 0:
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1197846065352886518 nsecs
active timers:
clock 1:
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <f7157dd4>, tick_sched_timer, S:01, tick_nohz_stop_sched_tick, swapper/0
# expires at 39080272000000 nsecs [in 124871093 nsecs]
#1: <f7157dd4>, it_real_fn, S:01, do_setitimer, syslogd/2522
# expires at 39086475543333 nsecs [in 6328414426 nsecs]
#2: <f7157dd4>, hrtimer_wakeup, S:01, do_nanosleep, atd/3039
# expires at 39332442305024 nsecs [in 252295176117 nsecs]
.expires_next : 39080320000000 nsecs
.hres_active : 1
.nr_events : 81214
.nohz_mode : 2
.idle_tick : 39078000000000 nsecs
.tick_stopped : 0
.idle_jiffies : 9694501
.idle_calls : 171123
.idle_sleeps : 106664
.idle_entrytime : 39080384013047 nsecs
.idle_sleeptime : 38923743508612 nsecs
.last_jiffies : 9695101
.next_jiffies : 9695250
.idle_expires : 39079996000000 nsecs
jiffies: 9695106

cpu: 1
clock 0:
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1197846065352886518 nsecs
active timers:
clock 1:
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <f7157dd4>, hrtimer_wakeup, S:01, do_nanosleep, crond/2946
# expires at 39115730211364 nsecs [in 35583082457 nsecs]
#1: <f7157dd4>, hrtimer_wakeup, S:01, do_nanosleep, smartd/3480
# expires at 39641838017337 nsecs [in 561690888430 nsecs]
.expires_next : 39079997000000 nsecs
.hres_active : 1
.nr_events : 48789
.nohz_mode : 2
.idle_tick : 39066865000000 nsecs
.tick_stopped : 0
.idle_jiffies : 9691717
.idle_calls : 125065
.idle_sleeps : 79815
.idle_entrytime : 39079996603653 nsecs
.idle_sleeptime : 38923914717954 nsecs
.last_jiffies : 9695000
.next_jiffies : 9695467
.idle_expires : 39081864000000 nsecs
jiffies: 9695162


Tick Device: mode: 1
Clock Event Device: hpet
max_delta_ns: 2147483647
min_delta_ns: 3352
mult: 61496110
shift: 32
mode: 3
next_event: 39080704000000 nsecs
set_next_event: hpet_legacy_next_event
set_mode: hpet_legacy_set_mode
event_handler: tick_handle_oneshot_broadcast
tick_broadcast_mask: 00000003
tick_broadcast_oneshot_mask: 00000001


Tick Device: mode: 1
Clock Event Device: lapic
max_delta_ns: 805378776
min_delta_ns: 1440
mult: 44735215
shift: 32
mode: 1
next_event: 39080796000000 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt

Tick Device: mode: 1
Clock Event Device: lapic
max_delta_ns: 805378776
min_delta_ns: 1440
mult: 44735215
shift: 32
mode: 3
next_event: 9223372036854775807 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt

BUG: soft lockup - CPU#1 stuck for 12s! [kjournald:424]

Pid: 424, comm: kjournald Not tainted (2.6.24-rc5 #24)
EIP: 0060:[<f8bb3e78>] EFLAGS: 00000202 CPU: 1
EIP is at scsi_request_fn+0x265/0x326 [scsi_mod]
EAX: f7bb8000 EBX: f7bb8000 ECX: f70fe028 EDX: f70fe028
ESI: f70fe000 EDI: f7105400 EBP: f70de408 ESP: f7157e18
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: b7d18eb4 CR3: 37d98000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
[<c04e4ca4>] blk_remove_plug+0x4f/0x5b
[<c04e4ccd>] __generic_unplug_device+0x1d/0x1f
[<c04e58e8>] generic_unplug_device+0x15/0x21
[<c04e3919>] blk_unplug+0x72/0x7a
[<f8b94813>] dm_table_unplug_all+0x1e/0x27 [dm_mod]
[<f8b92c63>] dm_unplug_all+0x17/0x21 [dm_mod]
[<c04e3919>] blk_unplug+0x72/0x7a
[<c0488fe1>] sync_buffer+0x2b/0x33
[<c0602e5a>] __wait_on_bit+0x33/0x58
[<c0488fb6>] sync_buffer+0x0/0x33
[<c0488fb6>] sync_buffer+0x0/0x33
[<c0602ee2>] out_of_line_wait_on_bit+0x63/0x6b
[<c0430e5e>] wake_bit_function+0x0/0x3c
[<c0488f80>] __wait_on_buffer+0x24/0x27
[<c0489d72>] sync_dirty_buffer+0x7f/0xb4
[<c04bd21f>] journal_get_descriptor_buffer+0x79/0x7f
[<c04ba3d3>] journal_commit_transaction+0x8dc/0xbc9
[<c0428246>] lock_timer_base+0x19/0x35
[<c04bcda0>] kjournald+0xa2/0x1d0
[<c0430e29>] autoremove_wake_function+0x0/0x35
[<c04bccfe>] kjournald+0x0/0x1d0
[<c0430d63>] kthread+0x38/0x5d
[<c0430d2b>] kthread+0x0/0x5d
[<c0404a0f>] kernel_thread_helper+0x7/0x10
=======================
Timer List Version: v0.3
HRTIMER_MAX_CLOCK_BASES: 2
now at 40490254040892 nsecs

cpu: 0
clock 0:
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1197846065352886518 nsecs
active timers:
clock 1:
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <f7157cc4>, tick_sched_timer, S:01, tick_nohz_stop_sched_tick, swapper/0
# expires at 40490384000000 nsecs [in 129959108 nsecs]
#1: <f7157cc4>, it_real_fn, S:01, do_setitimer, syslogd/2522
# expires at 40496478002707 nsecs [in 6223961815 nsecs]
#2: <f7157cc4>, hrtimer_wakeup, S:01, do_nanosleep, atd/3039
# expires at 40532442653957 nsecs [in 42188613065 nsecs]
.expires_next : 40490428000000 nsecs
.hres_active : 1
.nr_events : 83410
.nohz_mode : 2
.idle_tick : 40489800000000 nsecs
.tick_stopped : 0
.idle_jiffies : 10047451
.idle_calls : 174800
.idle_sleeps : 109338
.idle_entrytime : 40490492012833 nsecs
.idle_sleeptime : 40331022857511 nsecs
.last_jiffies : 10047628
.next_jiffies : 10048000
.idle_expires : 40489996000000 nsecs
jiffies: 10047633

cpu: 1
clock 0:
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1197846065352886518 nsecs
active timers:
clock 1:
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <f7157cc4>, hrtimer_wakeup, S:01, do_nanosleep, crond/2946
# expires at 40495766729200 nsecs [in 5512688308 nsecs]
#1: <f7157cc4>, hrtimer_wakeup, S:01, do_nanosleep, smartd/3480
# expires at 41441988869277 nsecs [in 951734828385 nsecs]
.expires_next : 40489997000000 nsecs
.hres_active : 1
.nr_events : 49576
.nohz_mode : 2
.idle_tick : 40476865000000 nsecs
.tick_stopped : 0
.idle_jiffies : 10044217
.idle_calls : 127015
.idle_sleeps : 81254
.idle_entrytime : 40489996578090 nsecs
.idle_sleeptime : 40331714407873 nsecs
.last_jiffies : 10047500
.next_jiffies : 10047967
.idle_expires : 40491864000000 nsecs
jiffies: 10047688


Tick Device: mode: 1
Clock Event Device: hpet
max_delta_ns: 2147483647
min_delta_ns: 3352
mult: 61496110
shift: 32
mode: 3
next_event: 40490812000000 nsecs
set_next_event: hpet_legacy_next_event
set_mode: hpet_legacy_set_mode
event_handler: tick_handle_oneshot_broadcast
tick_broadcast_mask: 00000003
tick_broadcast_oneshot_mask: 00000001


Tick Device: mode: 1
Clock Event Device: lapic
max_delta_ns: 805378776
min_delta_ns: 1440
mult: 44735215
shift: 32
mode: 1
next_event: 40490904000000 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt

Tick Device: mode: 1
Clock Event Device: lapic
max_delta_ns: 805378776
min_delta_ns: 1440
mult: 44735215
shift: 32
mode: 3
next_event: 9223372036854775807 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt

BUG: soft lockup - CPU#1 stuck for 12s! [swapper:0]

Pid: 0, comm: swapper Not tainted (2.6.24-rc5 #24)
EIP: 0060:[<c0603b70>] EFLAGS: 00000292 CPU: 1
EIP is at _spin_unlock_irqrestore+0x5/0x6
EAX: f7bdd24c EBX: f7110000 ECX: f70cc000 EDX: 00000292
ESI: f70cc07c EDI: 00000050 EBP: 00000064 ESP: f7829ee8
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: b7d18eb4 CR3: 37352000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
[<f8bd96ca>] ata_interrupt+0x1a8/0x1ba [libata]
[<c044aa0a>] handle_IRQ_event+0x1a/0x3f
[<c044bc42>] handle_fasteoi_irq+0x71/0xa4
[<c0405e8a>] do_IRQ+0x79/0x93
[<c04047cf>] common_interrupt+0x23/0x28
[<c041007b>] acpi_map_lsapic+0x2d/0xda
[<c0533a75>] acpi_processor_idle+0x293/0x43e
[<c05337e2>] acpi_processor_idle+0x0/0x43e
[<c05337e2>] acpi_processor_idle+0x0/0x43e
[<c0402575>] cpu_idle+0x97/0xb8
=======================
Timer List Version: v0.3
HRTIMER_MAX_CLOCK_BASES: 2
now at 40700144217096 nsecs

cpu: 0
clock 0:
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1197846065352886518 nsecs
active timers:
clock 1:
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <f7829d94>, tick_sched_timer, S:01, tick_nohz_stop_sched_tick, swapper/0
# expires at 40700264000000 nsecs [in 119782904 nsecs]
#1: <f7829d94>, it_real_fn, S:01, do_setitimer, syslogd/2522
# expires at 40706478360651 nsecs [in 6334143555 nsecs]
#2: <f7829d94>, hrtimer_wakeup, S:01, do_nanosleep, atd/3039
# expires at 40832442740683 nsecs [in 132298523587 nsecs]
.expires_next : 40700308000000 nsecs
.hres_active : 1
.nr_events : 83948
.nohz_mode : 2
.idle_tick : 40699800000000 nsecs
.tick_stopped : 0
.idle_jiffies : 10099951
.idle_calls : 175667
.idle_sleeps : 109741
.idle_entrytime : 40700372012887 nsecs
.idle_sleeptime : 40539661547333 nsecs
.last_jiffies : 10100098
.next_jiffies : 10100500
.idle_expires : 40699996000000 nsecs
jiffies: 10100103

cpu: 1
clock 0:
.index: 0
.resolution: 1 nsecs
.get_time: ktime_get_real
.offset: 1197846065352886518 nsecs
active timers:
clock 1:
.index: 1
.resolution: 1 nsecs
.get_time: ktime_get
.offset: 0 nsecs
active timers:
#0: <f7829d94>, hrtimer_wakeup, S:01, do_nanosleep, crond/2946
# expires at 40735767233236 nsecs [in 35623016140 nsecs]
#1: <f7829d94>, hrtimer_wakeup, S:01, do_nanosleep, smartd/3480
# expires at 41441988869277 nsecs [in 741844652181 nsecs]
.expires_next : 40699997000000 nsecs
.hres_active : 1
.nr_events : 49671
.nohz_mode : 2
.idle_tick : 40686865000000 nsecs
.tick_stopped : 0
.idle_jiffies : 10096717
.idle_calls : 127753
.idle_sleeps : 81895
.idle_entrytime : 40699996620694 nsecs
.idle_sleeptime : 40540536046589 nsecs
.last_jiffies : 10100000
.next_jiffies : 10100467
.idle_expires : 40701864000000 nsecs
jiffies: 10100158


Tick Device: mode: 1
Clock Event Device: hpet
max_delta_ns: 2147483647
min_delta_ns: 3352
mult: 61496110
shift: 32
mode: 3
next_event: 40700692000000 nsecs
set_next_event: hpet_legacy_next_event
set_mode: hpet_legacy_set_mode
event_handler: tick_handle_oneshot_broadcast
tick_broadcast_mask: 00000003
tick_broadcast_oneshot_mask: 00000003


Tick Device: mode: 1
Clock Event Device: lapic
max_delta_ns: 805378776
min_delta_ns: 1440
mult: 44735215
shift: 32
mode: 1
next_event: 40700784000000 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt

Tick Device: mode: 1
Clock Event Device: lapic
max_delta_ns: 805378776
min_delta_ns: 1440
mult: 44735215
shift: 32
mode: 1
next_event: 9223372036854775807 nsecs
set_next_event: lapic_next_event
set_mode: lapic_timer_setup
event_handler: hrtimer_interrupt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/