Re: kernel 2.6.37 : oops in cleanup_once

From: Yann Dupont
Date: Wed Feb 02 2011 - 10:04:18 EST


Le 02/02/2011 15:53, Eric Dumazet a Ãcrit :
Le mercredi 02 fÃvrier 2011 Ã 14:08 +0100, Yann Dupont a Ãcrit :
Le 02/02/2011 12:24, Eric Dumazet a Ãcrit :
Le mercredi 02 fÃvrier 2011 Ã 11:52 +0100, Eric Dumazet a Ãcrit :
Le mercredi 02 fÃvrier 2011 Ã 09:53 +0100, Yann Dupont a Ãcrit :
Hello.
We recently upgraded one machine with vanilla 2.6.37, and experienced 2
kernel oops since. Each oops is after ~1 week of uptime.
The last oops was last night but we didn't had any trace.
oops, 2.6.37 "only"

Yes this is a known problem.

Please try commit 3408404a4c2a4eead9d73b0bbbfe3f225b65f492
(inetpeer: Use correct AVL tree base pointer in inet_getpeer())

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3408404a4c2a4eead9d73b0bbbfe3f225b65f492

I believe David will send it to stable team shortly, if not already
done :)
Please ignore, this patch was for linux-2.6 tree, 2.6.37 was not
affected by the problem.

So its another problem... Is there anything particular you do on this
machine ?




Nothing really special there, we run a lot (20) of KVM guest (mainly
linux firewalls for lots of differents vlan), so we have a lot of
bridges vlan& tun/tap.
Oh, and CONFIG_BRIDGE_IGMP_SNOOPING is set to n (because of the other
bug already sent to netdev - more to come on next mail)

Hard to say if this BUG is new in 2.6.37. This host was running fine
with 2.6.34.2 since August 2010.
Bisecting will be hard due to the time to trigger the bug (and the fact
that this machine is a production machine)

Anyway, I can test with a specific kernel version if you suspect something.

I suspect a mem corruption from another layer (not inetpeer)

Unfortunately many kmem caches share the "64 bytes" cache.

Could you please add "slub_nomerge" on your boot command ?

Ok, will do it at 18:30 CET (to minimize impact)
It the suspected bug SLUB related ?

The 2.6.34.2 kernel previously used on that server used SLAB.


2 questions :
-How can I be sure slub_nomerge is active ? Boot message ?
-Is there a very severe impact on performance ?

Regards,

--
Yann Dupont - Service IRTS, DSI Università de Nantes
Tel : 02.53.48.49.20 - Mail/Jabber : Yann.Dupont@xxxxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/