Re: file_splice_read problem in 2.6.24.2?

From: Tristan Linnenbank
Date: Thu Jun 05 2008 - 02:59:20 EST


Jens Axboe wrote:
So either this is fixed by this:

http://git.kernel.dk/?p=linux-2.6.git;a=commit;h=8191ecd1d14c6914c660dfa007154860a7908857

or it's a different bug. You should post the full oops (including any
message that came before the oops, like the 'locked up for foo seconds'
in the urls you reference above) with the Code line at the bottom as
well so we can see what the registers are used for.

If it's the bug fixed with the above commit, then 2.6.25.x should
work. Unfortunately I'm unsure of the -stable status of the above
patch.

thanks for your reply.

I appended five of the bunch of errors to this mail. They all lock the CPU for 11 seconds (just like the nfsd errors we had in February/April), so that could be a sign of them being the same bug.

It seems to be the same problem. We've only seen this behaviour once on the one machine though. I'll keep a couple of webservers on 2.6.24.2 and some on 2.6.25.4, just to see what happens.

Thanks!

Kind regards,

Tristan

Jun 4 15:08:38 web10.c1.internal kernel: BUG: soft lockup - CPU#0 stuck for 11s! [apache2:22361]
Jun 4 15:08:38 web10.c1.internal kernel: Jun 4 15:08:38 web10.c1.internal kernel: Pid: 22361, comm: apache2 Not tainted (2.6.24.2-fwsh-byte #2)
Jun 4 15:08:38 web10.c1.internal kernel: EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 0
Jun 4 15:08:38 web10.c1.internal kernel: EIP is at find_get_pages_contig+0x67/0x73
Jun 4 15:08:38 web10.c1.internal kernel: EAX: 00000000 EBX: 00000010 ECX: c1c75e20 EDX: c1c75e20
Jun 4 15:08:38 web10.c1.internal kernel: ESI: 00000010 EDI: de5cb920 EBP: 00000010 ESP: d43b7cd8
Jun 4 15:08:38 web10.c1.internal kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Jun 4 15:08:38 web10.c1.internal kernel: CR0: 8005003b CR2: b77f8e04 CR3: 0c78a000 CR4: 000006f0
Jun 4 15:08:38 web10.c1.internal kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Jun 4 15:08:38 web10.c1.internal kernel: DR6: ffff0ff0 DR7: 00000400
Jun 4 15:08:38 web10.c1.internal kernel: [<c017c49c>] __generic_file_splice_read+0xa2/0x41e
Jun 4 15:08:38 web10.c1.internal kernel: [<c0113b11>] sched_slice+0x15/0x6f
Jun 4 15:08:38 web10.c1.internal kernel: [<c0110cb8>] read_hpet+0xa/0xd
Jun 4 15:08:38 web10.c1.internal kernel: [<c0131291>] getnstimeofday+0x31/0x105
Jun 4 15:08:38 web10.c1.internal kernel: [<c0122ed9>] lock_timer_base+0x27/0x51
Jun 4 15:08:38 web10.c1.internal kernel: [<c0122f83>] __mod_timer+0x80/0x8e
Jun 4 15:08:38 web10.c1.internal kernel: [<c040dde0>] tcp_keepalive_timer+0x0/0x1c4
Jun 4 15:08:38 web10.c1.internal kernel: [<c03d03ce>] sk_reset_timer+0xc/0x16
Jun 4 15:08:38 web10.c1.internal kernel: [<c040dd9b>] tcp_synack_timer+0x19/0x1d
Jun 4 15:08:38 web10.c1.internal kernel: [<c040df9c>] tcp_keepalive_timer+0x1bc/0x1c4
Jun 4 15:08:38 web10.c1.internal kernel: [<c0123289>] run_timer_softirq+0xcf/0x184
Jun 4 15:08:38 web10.c1.internal kernel: [<c012a893>] __rcu_process_callbacks+0x76/0xbb
Jun 4 15:08:38 web10.c1.internal kernel: [<c011f979>] tasklet_action+0x53/0x93
Jun 4 15:08:38 web10.c1.internal kernel: [<c011f754>] __do_softirq+0xba/0xcf
Jun 4 15:08:38 web10.c1.internal kernel: [<c017c88d>] generic_file_splice_read+0x75/0xc9
Jun 4 15:08:38 web10.c1.internal kernel: [<c01eda5c>] nfs_file_splice_read+0x67/0x9d
Jun 4 15:08:38 web10.c1.internal kernel: [<c017d083>] do_splice_to+0x6e/0x90
Jun 4 15:08:38 web10.c1.internal kernel: [<c017d144>] splice_direct_to_actor+0x9f/0x166
Jun 4 15:08:38 web10.c1.internal kernel: [<c017d20b>] direct_splice_actor+0x0/0x31
Jun 4 15:08:38 web10.c1.internal kernel: [<c017d2a4>] do_splice_direct+0x68/0x8b
Jun 4 15:08:38 web10.c1.internal kernel: [<c016141a>] do_readv_writev+0x130/0x193
Jun 4 15:08:38 web10.c1.internal kernel: [<c01617ff>] do_sendfile+0x1f5/0x256
Jun 4 15:08:38 web10.c1.internal kernel: [<c01618b8>] sys_sendfile+0x58/0xa5
Jun 4 15:08:38 web10.c1.internal kernel: [<c0102836>] sysenter_past_esp+0x5f/0x85
Jun 4 15:08:38 web10.c1.internal kernel: =======================
Jun 4 15:08:50 web10.c1.internal kernel: BUG: soft lockup - CPU#0 stuck for 11s! [apache2:22361]
Jun 4 15:08:50 web10.c1.internal kernel: Jun 4 15:08:50 web10.c1.internal kernel: Pid: 22361, comm: apache2 Not tainted (2.6.24.2-fwsh-byte #2)
Jun 4 15:08:50 web10.c1.internal kernel: EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 0
Jun 4 15:08:50 web10.c1.internal kernel: EIP is at find_get_pages_contig+0x67/0x73
Jun 4 15:08:50 web10.c1.internal kernel: EAX: 00000000 EBX: 00000010 ECX: c1c75e20 EDX: c1c75e20
Jun 4 15:08:50 web10.c1.internal kernel: ESI: 00000010 EDI: de5cb920 EBP: 00000010 ESP: d43b7cd8
Jun 4 15:08:50 web10.c1.internal kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Jun 4 15:08:50 web10.c1.internal kernel: CR0: 8005003b CR2: b77f8e04 CR3: 0c78a000 CR4: 000006f0
Jun 4 15:08:50 web10.c1.internal kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Jun 4 15:08:50 web10.c1.internal kernel: DR6: ffff0ff0 DR7: 00000400
Jun 4 15:08:50 web10.c1.internal kernel: [<c017c49c>] __generic_file_splice_read+0xa2/0x41e
Jun 4 15:08:50 web10.c1.internal kernel: [<c0113b11>] sched_slice+0x15/0x6f
Jun 4 15:08:50 web10.c1.internal kernel: [<c0110cb8>] read_hpet+0xa/0xd
Jun 4 15:08:50 web10.c1.internal kernel: [<c0131291>] getnstimeofday+0x31/0x105
Jun 4 15:08:50 web10.c1.internal kernel: [<c0134301>] clockevents_program_event+0xbf/0x134
Jun 4 15:08:50 web10.c1.internal kernel: [<c012ef49>] ktime_get_ts+0x15/0x47
Jun 4 15:08:50 web10.c1.internal kernel: [<c01231ea>] run_timer_softirq+0x30/0x184
Jun 4 15:08:50 web10.c1.internal kernel: [<c012a893>] __rcu_process_callbacks+0x76/0xbb
Jun 4 15:08:50 web10.c1.internal kernel: [<c011f979>] tasklet_action+0x53/0x93
Jun 4 15:08:50 web10.c1.internal kernel: [<c011f754>] __do_softirq+0xba/0xcf
Jun 4 15:08:50 web10.c1.internal kernel: [<c017c88d>] generic_file_splice_read+0x75/0xc9
Jun 4 15:08:50 web10.c1.internal kernel: [<c01eda5c>] nfs_file_splice_read+0x67/0x9d
Jun 4 15:08:50 web10.c1.internal kernel: [<c017d083>] do_splice_to+0x6e/0x90
Jun 4 15:08:50 web10.c1.internal kernel: [<c017d144>] splice_direct_to_actor+0x9f/0x166
Jun 4 15:08:50 web10.c1.internal kernel: [<c017d20b>] direct_splice_actor+0x0/0x31
Jun 4 15:08:50 web10.c1.internal kernel: [<c017d2a4>] do_splice_direct+0x68/0x8b
Jun 4 15:08:50 web10.c1.internal kernel: [<c016141a>] do_readv_writev+0x130/0x193
Jun 4 15:08:50 web10.c1.internal kernel: [<c01617ff>] do_sendfile+0x1f5/0x256
Jun 4 15:08:50 web10.c1.internal kernel: [<c01618b8>] sys_sendfile+0x58/0xa5
Jun 4 15:08:50 web10.c1.internal kernel: [<c0102836>] sysenter_past_esp+0x5f/0x85
Jun 4 15:08:51 web10.c1.internal kernel: =======================
Jun 4 15:09:02 web10.c1.internal kernel: BUG: soft lockup - CPU#0 stuck for 11s! [apache2:22361]
Jun 4 15:09:02 web10.c1.internal kernel: Jun 4 15:09:02 web10.c1.internal kernel: Pid: 22361, comm: apache2 Not tainted (2.6.24.2-fwsh-byte #2)
Jun 4 15:09:02 web10.c1.internal kernel: EIP: 0060:[<c0147bde>] EFLAGS: 00000246 CPU: 0
Jun 4 15:09:02 web10.c1.internal kernel: EIP is at put_page+0x7/0x20
Jun 4 15:09:02 web10.c1.internal kernel: EAX: 80000028 EBX: 00000010 ECX: 00000010 EDX: c2180ea0
Jun 4 15:09:02 web10.c1.internal kernel: ESI: 00000000 EDI: de5cb870 EBP: 00000000 ESP: d43b7ce8
Jun 4 15:09:02 web10.c1.internal kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Jun 4 15:09:02 web10.c1.internal kernel: CR0: 8005003b CR2: b77f8e04 CR3: 0c78a000 CR4: 000006f0
Jun 4 15:09:02 web10.c1.internal kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Jun 4 15:09:02 web10.c1.internal kernel: DR6: ffff0ff0 DR7: 00000400
Jun 4 15:09:02 web10.c1.internal kernel: [<c017c6bc>] __generic_file_splice_read+0x2c2/0x41e
Jun 4 15:09:02 web10.c1.internal kernel: [<c0113b11>] sched_slice+0x15/0x6f
Jun 4 15:09:02 web10.c1.internal kernel: [<c0110cb8>] read_hpet+0xa/0xd
Jun 4 15:09:02 web10.c1.internal kernel: [<c0131291>] getnstimeofday+0x31/0x105
Jun 4 15:09:02 web10.c1.internal kernel: [<f905ee38>] kcs_event+0xb0/0x690 [ipmi_si]
Jun 4 15:09:02 web10.c1.internal kernel: [<c0134301>] clockevents_program_event+0xbf/0x134
Jun 4 15:09:02 web10.c1.internal kernel: [<f905c07d>] start_next_msg+0x14/0xa1 [ipmi_si]
Jun 4 15:09:02 web10.c1.internal kernel: [<c0122ed9>] lock_timer_base+0x27/0x51
Jun 4 15:09:02 web10.c1.internal kernel: [<c0122f83>] __mod_timer+0x80/0x8e
Jun 4 15:09:02 web10.c1.internal kernel: [<f905c9ba>] smi_timeout+0x0/0xfe [ipmi_si]
Jun 4 15:09:02 web10.c1.internal kernel: [<c0123289>] run_timer_softirq+0xcf/0x184
Jun 4 15:09:02 web10.c1.internal kernel: [<c012a893>] __rcu_process_callbacks+0x76/0xbb
Jun 4 15:09:02 web10.c1.internal kernel: [<c011f979>] tasklet_action+0x53/0x93
Jun 4 15:09:02 web10.c1.internal kernel: [<c011f754>] __do_softirq+0xba/0xcf
Jun 4 15:09:02 web10.c1.internal kernel: [<c017c88d>] generic_file_splice_read+0x75/0xc9
Jun 4 15:09:02 web10.c1.internal kernel: [<c01eda5c>] nfs_file_splice_read+0x67/0x9d
Jun 4 15:09:02 web10.c1.internal kernel: [<c017d083>] do_splice_to+0x6e/0x90
Jun 4 15:09:02 web10.c1.internal kernel: [<c017d144>] splice_direct_to_actor+0x9f/0x166
Jun 4 15:09:02 web10.c1.internal kernel: [<c017d20b>] direct_splice_actor+0x0/0x31
Jun 4 15:09:02 web10.c1.internal kernel: [<c017d2a4>] do_splice_direct+0x68/0x8b
Jun 4 15:09:02 web10.c1.internal kernel: [<c016141a>] do_readv_writev+0x130/0x193
Jun 4 15:09:02 web10.c1.internal kernel: [<c01617ff>] do_sendfile+0x1f5/0x256
Jun 4 15:09:02 web10.c1.internal kernel: [<c01618b8>] sys_sendfile+0x58/0xa5
Jun 4 15:09:02 web10.c1.internal kernel: [<c0102836>] sysenter_past_esp+0x5f/0x85
Jun 4 15:09:03 web10.c1.internal kernel: =======================
Jun 4 15:09:14 web10.c1.internal kernel: BUG: soft lockup - CPU#0 stuck for 11s! [apache2:22361]
Jun 4 15:09:14 web10.c1.internal kernel: Jun 4 15:09:14 web10.c1.internal kernel: Pid: 22361, comm: apache2 Not tainted (2.6.24.2-fwsh-byte #2)
Jun 4 15:09:14 web10.c1.internal kernel: EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 0
Jun 4 15:09:14 web10.c1.internal kernel: EIP is at find_get_pages_contig+0x67/0x73
Jun 4 15:09:14 web10.c1.internal kernel: EAX: 00000000 EBX: 00000010 ECX: c1c75e20 EDX: c1c75e20
Jun 4 15:09:14 web10.c1.internal kernel: ESI: 00000010 EDI: de5cb920 EBP: 00000010 ESP: d43b7cd8
Jun 4 15:09:14 web10.c1.internal kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Jun 4 15:09:14 web10.c1.internal kernel: CR0: 8005003b CR2: b77f8e04 CR3: 0c78a000 CR4: 000006f0
Jun 4 15:09:14 web10.c1.internal kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Jun 4 15:09:14 web10.c1.internal kernel: DR6: ffff0ff0 DR7: 00000400
Jun 4 15:09:14 web10.c1.internal kernel: [<c017c49c>] __generic_file_splice_read+0xa2/0x41e
Jun 4 15:09:14 web10.c1.internal kernel: [<c0113b11>] sched_slice+0x15/0x6f
Jun 4 15:09:14 web10.c1.internal kernel: [<c0110cb8>] read_hpet+0xa/0xd
Jun 4 15:09:14 web10.c1.internal kernel: [<c0131291>] getnstimeofday+0x31/0x105
Jun 4 15:09:14 web10.c1.internal kernel: [<f905ee38>] kcs_event+0xb0/0x690 [ipmi_si]
Jun 4 15:09:14 web10.c1.internal kernel: [<c0134301>] clockevents_program_event+0xbf/0x134
Jun 4 15:09:14 web10.c1.internal kernel: [<f905c07d>] start_next_msg+0x14/0xa1 [ipmi_si]
Jun 4 15:09:14 web10.c1.internal kernel: [<c0122ed9>] lock_timer_base+0x27/0x51
Jun 4 15:09:14 web10.c1.internal kernel: [<c0122f83>] __mod_timer+0x80/0x8e
Jun 4 15:09:14 web10.c1.internal kernel: [<f905c9ba>] smi_timeout+0x0/0xfe [ipmi_si]
Jun 4 15:09:14 web10.c1.internal kernel: [<c0123289>] run_timer_softirq+0xcf/0x184
Jun 4 15:09:14 web10.c1.internal kernel: [<c012a893>] __rcu_process_callbacks+0x76/0xbb
Jun 4 15:09:14 web10.c1.internal kernel: [<c011f979>] tasklet_action+0x53/0x93
Jun 4 15:09:14 web10.c1.internal kernel: [<c011f754>] __do_softirq+0xba/0xcf
Jun 4 15:09:14 web10.c1.internal kernel: [<c017c88d>] generic_file_splice_read+0x75/0xc9
Jun 4 15:09:14 web10.c1.internal kernel: [<c01eda5c>] nfs_file_splice_read+0x67/0x9d
Jun 4 15:09:14 web10.c1.internal kernel: [<c017d083>] do_splice_to+0x6e/0x90
Jun 4 15:09:14 web10.c1.internal kernel: [<c017d144>] splice_direct_to_actor+0x9f/0x166
Jun 4 15:09:14 web10.c1.internal kernel: [<c017d20b>] direct_splice_actor+0x0/0x31
Jun 4 15:09:14 web10.c1.internal kernel: [<c017d2a4>] do_splice_direct+0x68/0x8b
Jun 4 15:09:14 web10.c1.internal kernel: [<c016141a>] do_readv_writev+0x130/0x193
Jun 4 15:09:14 web10.c1.internal kernel: [<c01617ff>] do_sendfile+0x1f5/0x256
Jun 4 15:09:14 web10.c1.internal kernel: [<c01618b8>] sys_sendfile+0x58/0xa5
Jun 4 15:09:14 web10.c1.internal kernel: [<c0102836>] sysenter_past_esp+0x5f/0x85
Jun 4 15:09:15 web10.c1.internal kernel: =======================
Jun 4 15:09:27 web10.c1.internal kernel: BUG: soft lockup - CPU#0 stuck for 11s! [apache2:22361]
Jun 4 15:09:27 web10.c1.internal kernel: Jun 4 15:09:27 web10.c1.internal kernel: Pid: 22361, comm: apache2 Not tainted (2.6.24.2-fwsh-byte #2)
Jun 4 15:09:27 web10.c1.internal kernel: EIP: 0060:[<c0140967>] EFLAGS: 00000286 CPU: 0
Jun 4 15:09:27 web10.c1.internal kernel: EIP is at find_get_pages_contig+0x67/0x73
Jun 4 15:09:27 web10.c1.internal kernel: EAX: 00000000 EBX: 00000010 ECX: c1c75e20 EDX: c1c75e20
Jun 4 15:09:27 web10.c1.internal kernel: ESI: 00000010 EDI: de5cb920 EBP: 00000010 ESP: d43b7cd8
Jun 4 15:09:27 web10.c1.internal kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Jun 4 15:09:27 web10.c1.internal kernel: CR0: 8005003b CR2: b77f8e04 CR3: 0c78a000 CR4: 000006f0
Jun 4 15:09:27 web10.c1.internal kernel: DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Jun 4 15:09:27 web10.c1.internal kernel: DR6: ffff0ff0 DR7: 00000400
Jun 4 15:09:27 web10.c1.internal kernel: [<c017c49c>] __generic_file_splice_read+0xa2/0x41e
Jun 4 15:09:27 web10.c1.internal kernel: [<c0132efc>] clocksource_get_next+0x3a/0x40
Jun 4 15:09:27 web10.c1.internal kernel: [<c01315cb>] change_clocksource+0xc/0x205
Jun 4 15:09:27 web10.c1.internal kernel: [<c0113b11>] sched_slice+0x15/0x6f
Jun 4 15:09:27 web10.c1.internal kernel: [<c0110cb8>] read_hpet+0xa/0xd
Jun 4 15:09:27 web10.c1.internal kernel: [<c0131291>] getnstimeofday+0x31/0x105
Jun 4 15:09:27 web10.c1.internal kernel: [<f905ee38>] kcs_event+0xb0/0x690 [ipmi_si]
Jun 4 15:09:27 web10.c1.internal kernel: [<c0134301>] clockevents_program_event+0xbf/0x134
Jun 4 15:09:27 web10.c1.internal kernel: [<f905c07d>] start_next_msg+0x14/0xa1 [ipmi_si]
Jun 4 15:09:27 web10.c1.internal kernel: [<c0122ed9>] lock_timer_base+0x27/0x51
Jun 4 15:09:27 web10.c1.internal kernel: [<c0122f83>] __mod_timer+0x80/0x8e
Jun 4 15:09:27 web10.c1.internal kernel: [<f905c9ba>] smi_timeout+0x0/0xfe [ipmi_si]
Jun 4 15:09:27 web10.c1.internal kernel: [<c0123289>] run_timer_softirq+0xcf/0x184
Jun 4 15:09:27 web10.c1.internal kernel: [<c012a893>] __rcu_process_callbacks+0x76/0xbb
Jun 4 15:09:27 web10.c1.internal kernel: [<c011f979>] tasklet_action+0x53/0x93
Jun 4 15:09:27 web10.c1.internal kernel: [<c011f754>] __do_softirq+0xba/0xcf
Jun 4 15:09:27 web10.c1.internal kernel: [<c017c88d>] generic_file_splice_read+0x75/0xc9
Jun 4 15:09:27 web10.c1.internal kernel: [<c01eda5c>] nfs_file_splice_read+0x67/0x9d
Jun 4 15:09:27 web10.c1.internal kernel: [<c017d083>] do_splice_to+0x6e/0x90
Jun 4 15:09:27 web10.c1.internal kernel: [<c017d144>] splice_direct_to_actor+0x9f/0x166
Jun 4 15:09:27 web10.c1.internal kernel: [<c017d20b>] direct_splice_actor+0x0/0x31
Jun 4 15:09:27 web10.c1.internal kernel: [<c017d2a4>] do_splice_direct+0x68/0x8b
Jun 4 15:09:27 web10.c1.internal kernel: [<c016141a>] do_readv_writev+0x130/0x193
Jun 4 15:09:27 web10.c1.internal kernel: [<c01617ff>] do_sendfile+0x1f5/0x256
Jun 4 15:09:27 web10.c1.internal kernel: [<c01618b8>] sys_sendfile+0x58/0xa5
Jun 4 15:09:27 web10.c1.internal kernel: [<c0102836>] sysenter_past_esp+0x5f/0x85
Jun 4 15:09:27 web10.c1.internal kernel: =======================

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/