Re: crash in active_task, 3.0-rc6+

From: Ben Greear
Date: Tue Jul 19 2011 - 12:10:27 EST


On 07/19/2011 07:28 AM, Paul E. McKenney wrote:
On Tue, Jul 19, 2011 at 01:08:21PM +0200, Peter Zijlstra wrote:
On Mon, 2011-07-11 at 11:01 -0700, Ben Greear wrote:
This is from a lightly patched kernel (nfs related patches)
Same system sees lockdep warnings related to scheduling that
I've been posting as well...

I haven't seen anything similar in older kernels, so this may
be a regression. We tested .38 in this setup without seeing it,
but havn't tested .39.

[root@simech2 ~]# BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
IP: [<ffffffff81231bf8>] do_raw_spin_lock+0x14/0x13c
PGD 127082067 PUD 127081067 PMD 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU 2
Modules linked in: 8021q garp xt_addrtype xt_TPROXY nf_tproxy_core xt_socket nf_defrag_ipv6 xt_set ip_set nfnetlink xt_connlimit ip]

Pid: 2203, comm: gnuserver Not tainted 3.0.0-rc6+ #20 Supermicro X7DBU/X7DBU
RIP: 0010:[<ffffffff81231bf8>] [<ffffffff81231bf8>] do_raw_spin_lock+0x14/0x13c
RSP: 0018:ffff8801104e3628 EFLAGS: 00010096
RAX: ffff880126b9e7e0 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff8801104e3668 R08: 0000000000000002 R09: 0000000000000001
R10: 000002bf7b96a212 R11: ffff880107481e30 R12: 0000000000000003
R13: 0000000000000000 R14: 0000000000000046 R15: 0000000000000002
FS: 00007f213f07c720(0000) GS:ffff88012fc80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000004 CR3: 0000000110519000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process gnuserver (pid: 2203, threadinfo ffff8801104e2000, task ffff880126b9e7e0)
Stack:
0000000000000018 0000000000000046 00000000bf3d0800 0000000000000000
0000000000000003 0000000000000000 0000000000000046 0000000000000002
ffff8801104e3698 ffffffff8147e459 ffffffff810a7f25 ffffffff8103d2f7
Call Trace:
[<ffffffff8147e459>] _raw_spin_lock+0x3e/0x45
[<ffffffff810a7f25>] ? __rcu_read_unlock+0xdc/0x24e
[<ffffffff8103d2f7>] ? activate_task+0x30/0x30
[<ffffffff810a7f25>] __rcu_read_unlock+0xdc/0x24e
[<ffffffff8103d34d>] rcu_read_unlock+0x21/0x23
[<ffffffff8103efd7>] select_task_rq_fair+0x8cc/0x8e1
[<ffffffff81042b59>] ? finish_task_switch+0x78/0xf1
[<ffffffff81042b1b>] ? finish_task_switch+0x3a/0xf1
[<ffffffff810468d0>] ? try_to_wake_up+0x29/0x1a0
[<ffffffff810468d0>] ? try_to_wake_up+0x29/0x1a0
[<ffffffff81040c52>] select_task_rq+0x13/0x44
[<ffffffff8104697f>] try_to_wake_up+0xd8/0x1a0
[<ffffffff8103c783>] ? __wake_up+0x1d/0x48
[<ffffffff81046a54>] default_wake_function+0xd/0xf
[<ffffffff8106751a>] autoremove_wake_function+0x13/0x38
[<ffffffff810395d0>] __wake_up_common+0x49/0x7f
[<ffffffff8103c79a>] __wake_up+0x34/0x48
[<ffffffff810a74a9>] rcu_report_exp_rnp+0x50/0x89
[<ffffffff810a7f25>] ? __rcu_read_unlock+0xdc/0x24e
[<ffffffff810a8032>] __rcu_read_unlock+0x1e9/0x24e
[<ffffffff813fce40>] rcu_read_unlock+0x21/0x23
[<ffffffff813fd52d>] ip_queue_xmit+0x35e/0x3b1
[<ffffffff813fd1cf>] ? ip_send_reply+0x247/0x247
[<ffffffff8140f5f3>] tcp_transmit_skb+0x785/0x7c3
[<ffffffff81411e23>] tcp_write_xmit+0x806/0x8f5
[<ffffffff810e646f>] ? might_fault+0x4e/0x9e
[<ffffffff81403e25>] ? copy_from_user+0x2a/0x2c
[<ffffffff81411f63>] __tcp_push_pending_frames+0x20/0x4d
[<ffffffff8140411f>] tcp_push+0x84/0x86
[<ffffffff81406577>] tcp_sendmsg+0x674/0x775
[<ffffffff81423d68>] inet_sendmsg+0x61/0x6a
[<ffffffff813af67a>] __sock_sendmsg_nosec+0x58/0x61
[<ffffffff813b0db5>] __sock_sendmsg+0x3d/0x48
[<ffffffff813b1631>] sock_sendmsg+0xa3/0xbc
[<ffffffff810ea6ae>] ? handle_pte_fault+0x7fc/0x84d
[<ffffffff81110b14>] ? mem_cgroup_get_limit+0x45/0x45
[<ffffffff8111a4e9>] ? fget_light+0x35/0xac
[<ffffffff813b16b2>] ? sockfd_lookup_light+0x1b/0x53
[<ffffffff813b1bf5>] sys_sendto+0xfa/0x11f
[<ffffffff8147f0d8>] ? retint_restore_args+0x13/0x13
[<ffffffff8109f0a1>] ? audit_syscall_entry+0x119/0x145
[<ffffffff81484d52>] system_call_fastpath+0x16/0x1b


Fun, that's a recursion we've been poking at for the past few days, but
I haven't seen it do a NULL-ptr deref like this yet.

I'm also not quite sure the below patches will catch this particular
recursion, it might be you need the patch from
lkml.kernel.org/r/20110718151524.GA4236@xxxxxxxxxxxxxxxxxx as well,
since this is RCU recursing on itself without the aid of interrupts.

How reproducable is this? And if so, what's your .config and recipe for
making it go bang?

Yow!

I bet that the .config includes CONFIG_RCU_BOOST=y. If so, the following
patch will be needed. Otherwise, I have no idea how the above sequence
of events happened -- all of the ->rcu_read_unlock_special bits should
be cleared by the time that rcu_report_exp_rnp() is invoked. Though
the patch that Peter pointed you at on lkml.kernel.org above would likely
make it go away in conjunction with the patch below.

Thanx, Paul

Yes, I was running with boost enabled:

[greearb@fs3 linux-3.0.x64-nfs]$ grep BOOST .config
CONFIG_RCU_BOOST=y
CONFIG_RCU_BOOST_PRIO=1
CONFIG_RCU_BOOST_DELAY=500

Work load was mounting 200 nfs mounts, doing O_DIRECT io (200 processes, one per mount) for 30
seconds or so, unmount, repeat..for hours. I could reliably crash the system on over-night
runs even after fixing the NFS bugs, but it crashed in various ways.

I'll re-run the tests with the latest upstream tree plus these patches, assuming
they are not already in the tree. Might be a day or two, before I can get this
set up, however, as the machine in question is running some other tests on an older
kernel at the moment.

Thanks,
Ben

--
Ben Greear <greearb@xxxxxxxxxxxxxxx>
Candela Technologies Inc http://www.candelatech.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/