BUG: sleeping while atomic from ipmr_destroy_unres

From: Tom Burns
Date: Thu Apr 02 2009 - 15:28:40 EST


Hi All,

I apologize for talking about a kernel bug against 2.6.20, but it's
in a production environment, I don't have the resources to replace the
kernel and reproduce the issue at this time, and from my code
comparison this issue appears to still exist in 2.6.29 (might_sleep()
still results in schedule() call if CONFIG_PREEMPT_VOLUNTARY=y,
__do_softirq still calls __local_bh_disable(), which still puts us in
atomic for the duration of the __do_softirq - which makes sense, and
the ipmr code is still the same).

I'm debugging a kernel BUG reported in a production environment
running 2.6.20 SMP (from Redhat RPM, full version
2.6.20-1.2320.fc5smp) and have run into 2 related BUGs. When we only
see the first it's non-fatal, but the second BUG (below) is always
paired with a panic and resultant crash.

Here is the first BUG that makes it to the log:

BUG: sleeping function called from invalid context at kernel/mutex.c:86
in_atomic():1, irqs_disabled():0
[phys_startup_32+-1084201671/-1073741824] ipmr_expire_process+0x0/0x8e
[phys_startup_32+-1084105458/-1073741824] mutex_lock+0x15/0x29
[phys_startup_32+-1084464563/-1073741824] rtnetlink_rcv+0x17/0x3d
[phys_startup_32+-1084407949/-1073741824] netlink_data_ready+0x12/0x52
[phys_startup_32+-1084412015/-1073741824] netlink_sendskb+0x1c/0x33
[phys_startup_32+-1084463665/-1073741824] rtnl_unicast+0x18/0x23
[phys_startup_32+-1084201716/-1073741824] ipmr_destroy_unres+0xc7/0xf4
[phys_startup_32+-1086087209/-1073741824] hrtimer_run_queues+0x127/0x141
[phys_startup_32+-1085998565/-1073741824] enable_irq+0x92/0xaf
[phys_startup_32+-1084161925/-1073741824] xfrm_timer_handler+0x18d/0x21a
[phys_startup_32+-1084201576/-1073741824] ipmr_expire_process+0x5f/0x8e
[phys_startup_32+-1086134877/-1073741824] run_timer_softirq+0x101/0x164
[phys_startup_32+-1086148568/-1073741824] __do_softirq+0x5d/0xba
[phys_startup_32+-1086299719/-1073741824] do_softirq+0x59/0xb1
[phys_startup_32+-1086179650/-1073741824] scheduler_tick+0x7c/0xdc
[phys_startup_32+-1085997014/-1073741824] handle_edge_irq+0x0/0x10a
[phys_startup_32+-1086299433/-1073741824] do_IRQ+0xc6/0xdb
[phys_startup_32+-1086306041/-1073741824] common_interrupt+0x23/0x28
[phys_startup_32+-1084161997/-1073741824] xfrm_timer_handler+0x145/0x21a

It appears that the top ipmr_expire_process call is some sort of
reporting mistake in dump_stack(), I don't see how it's being executed
by mutex_lock().

I've determined that the problem is caused by a combination of config
flags and code path.

Our kernel config has:
CONFIG_PREEMPT_VOLUNTARY = y
CONFIG_PREEMPT_BKL = y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
CONFIG_PREEMPT is not set and neither is CONFIG_PREEMPT_NONE

With these flags, in_atomic() is always set to '1' during the entire
time doing the softirq.

The code path which causes the BUG is in net/ipv4/ipmr.c in function
ipmr_destroy_unres:

307 if (skb->nh.iph->version == 0) {
308 struct nlmsghdr *nlh = (struct nlmsghdr
*)skb_pull(skb, sizeof(struct iphdr));
309 nlh->nlmsg_type = NLMSG_ERROR;
310 nlh->nlmsg_len =
NLMSG_LENGTH(sizeof(struct nlmsgerr));
311 skb_trim(skb, nlh->nlmsg_len);
312 e = NLMSG_DATA(nlh);
313 e->error = -ETIMEDOUT;
314 memset(&e->msg, 0, sizeof(e->msg));
315
316 rtnl_unicast(skb, NETLINK_CB(skb).pid);
317 } else
318 kfree_skb(skb);

So the BUG only occurs if we try to unicast out a netlink error packet
in response to the inability to resolve a netlink multicast packet
(iph->version == 0). When this happens we eventually call mutex_lock,
who calls might_sleep(), which with our kernel config results in
reporting the BUG in __might_sleep(), and then attempting to
schedule() (with CONFIG_PREEMPT_VOLUNTARY = y, might_sleep() resolves
to "__might_sleep(); schedule();" ).

I'm unsure if we're supposed to be in_atomic at this point because in
hrtimer_run_queue before running the timer function
(ipmr_expire_process) seems to try to re-enable kernel preemption
(__run_hrtimer calls spin_unlock calls _spin_unlock calls
preempt_enable(), which resolves to nothing since CONFIG_PREEMPT = n.

Unfortunately this otherwise harmless BUG seems to become fatal to the
system if there is a process waiting to be scheduled when mutex_lock
calls might_sleep(). When that's the case, we see the following BUG
immediately after the first one:

BUG: scheduling while atomic: Parser/0x00000100/2465
[phys_startup_32+-1084110266/-1073741824] __sched_text_start+0x56/0xa21
[phys_startup_32+-1086166488/-1073741824] release_console_sem+0x17f/0x1be
[<e0510524>] __nf_conntrack_confirm+0x2a6/0x2d9 [nf_conntrack]
[phys_startup_32+-1086299433/-1073741824] do_IRQ+0xc6/0xdb
[phys_startup_32+-1084105164/-1073741824] __mutex_lock_slowpath+0x45/0x77
[phys_startup_32+-1084105441/-1073741824] mutex_lock+0x26/0x29
[phys_startup_32+-1084464563/-1073741824] rtnetlink_rcv+0x17/0x3d
[phys_startup_32+-1084407949/-1073741824] netlink_data_ready+0x12/0x52
[phys_startup_32+-1084412015/-1073741824] netlink_sendskb+0x1c/0x33
[phys_startup_32+-1084409373/-1073741824] netlink_ack+0x178/0x193
[phys_startup_32+-1084409070/-1073741824] netlink_run_queue+0x7b/0xca
[phys_startup_32+-1084464481/-1073741824] rtnetlink_rcv_msg+0x0/0x1e4
[phys_startup_32+-1086302708/-1073741824] dump_stack+0x12/0x14
[phys_startup_32+-1084201671/-1073741824] ipmr_expire_process+0x0/0x8e
[phys_startup_32+-1084464549/-1073741824] rtnetlink_rcv+0x25/0x3d
[phys_startup_32+-1084407949/-1073741824] netlink_data_ready+0x12/0x52
[phys_startup_32+-1084412015/-1073741824] netlink_sendskb+0x1c/0x33
[phys_startup_32+-1084463665/-1073741824] rtnl_unicast+0x18/0x23
[phys_startup_32+-1084201716/-1073741824] ipmr_destroy_unres+0xc7/0xf4
[phys_startup_32+-1086087209/-1073741824] hrtimer_run_queues+0x127/0x141
[phys_startup_32+-1085998565/-1073741824] enable_irq+0x92/0xaf
[phys_startup_32+-1084161925/-1073741824] xfrm_timer_handler+0x18d/0x21a
[phys_startup_32+-1084201576/-1073741824] ipmr_expire_process+0x5f/0x8e
[phys_startup_32+-1086134877/-1073741824] run_timer_softirq+0x101/0x164
[phys_startup_32+-1086148568/-1073741824] __do_softirq+0x5d/0xba
[phys_startup_32+-1086299719/-1073741824] do_softirq+0x59/0xb1
[phys_startup_32+-1086179650/-1073741824] scheduler_tick+0x7c/0xdc
[phys_startup_32+-1085997014/-1073741824] handle_edge_irq+0x0/0x10a
[phys_startup_32+-1086299433/-1073741824] do_IRQ+0xc6/0xdb
[phys_startup_32+-1086306041/-1073741824] common_interrupt+0x23/0x28
[phys_startup_32+-1084161997/-1073741824] xfrm_timer_handler+0x145/0x21a
=======================

At this point the computer has effectively crashed.

I'm a complete newcomer to netlink packets, so while I'm going to try
to send some unresolvable netlink packets with hope to make this BUG
easily reproduce able, if anyone has any other ideas I'd love to hear
them.

Thank you,
Tom Burns
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/