Re: Kernel crash after using new Intel NIC (igb)

From: Arun Sharma
Date: Tue May 24 2011 - 17:33:32 EST


On Thu, May 12, 2011 at 11:15:53PM +0200, Eric Dumazet wrote:
>
> Probably not.
>
> What gives slub_nomerge=1 for you ?
>

It took me a while to get a new kernel on a large enough sample
of machines to get some data.

Like you observed in the other thread, this is unlikely to be a random
memory corruption.

The panics stopped after we moved the list_empty() check under the lock.

--- a/net/ipv4/inetpeer.c
+++ b/net/ipv4/inetpeer.c
@@ -154,11 +154,11 @@ void __init inet_initpeers(void)
/* Called with or without local BH being disabled. */
static void unlink_from_unused(struct inet_peer *p)
{
+ spin_lock_bh(&unused_peers.lock);
if (!list_empty(&p->unused)) {
- spin_lock_bh(&unused_peers.lock);
list_del_init(&p->unused);
- spin_unlock_bh(&unused_peers.lock);
}
+ spin_unlock_bh(&unused_peers.lock);
}

static int addr_compare(const struct inetpeer_addr *a,

The idea being that the list gets corrupted under some kind of a race
condition. Two threads racing on list_empty() and executing
list_del_init() seems harmless.

There is probably a different race condition that is mitigated by doing
the list_empty() check under the lock.

-Arun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/