Re: Badness at kernel/rcutree.c:1228

From: kordex -
Date: Thu Dec 10 2009 - 09:24:22 EST


Hey,

memtester under Linux runs fine if i don't tell it to test too much
memory at one time: I have 512M installed and right after boot i can
test as root 460M but if i increase it too much so like it would mlock
470M I get faults. The odd thing is that when running as unpriviledged
user I get faults too but the program tells that it can't even get ram
at all mlocked.

When running as unpriviledged (not root):

navi:~> memtester 470 1

Copyright (C) 2007 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffff000
want 470MB (492830720 bytes)
got 470MB (492830720 bytes), trying mlock ...too many pages, reducing...
got 469MB (492826624 bytes), trying mlock ...too many pages, reducing...
got 469MB (492822528 bytes), trying mlock ...too many pages, reducing...

...

got 0MB (81920 bytes), trying mlock ...too many pages, reducing...
got 0MB (77824 bytes), trying mlock ...too many pages, reducing...
got 0MB (73728 bytes), trying mlock ...too many pages, reducing...
got 0MB (69632 bytes), trying mlock ...too many pages, reducing...
got 0MB (65536 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok

Done.
navi:~>

dmesg:

munin-update: page allocation failure. order:1, mode:0x20
Call Trace:
[d8f35a30] [c0008a24] show_stack+0x4c/0x14c (unreliable)
[d8f35a70] [c005817c] __alloc_pages_nodemask+0x468/0x4b8
[d8f35b00] [c0077470] cache_alloc_refill+0x2d8/0x55c
[d8f35b50] [c0077858] kmem_cache_alloc+0x64/0xb0
[d8f35b70] [c0486320] sk_prot_alloc+0x2c/0x78
[d8f35b90] [c0486404] sk_clone+0x20/0x1b0
[d8f35bb0] [c04bca98] inet_csk_clone+0x1c/0x8c
[d8f35bc0] [c04d1380] tcp_create_openreq_child+0x20/0x2c4
[d8f35be0] [c04d00c8] tcp_v4_syn_recv_sock+0x58/0x17c
[d8f35c00] [c04d11ec] tcp_check_req+0x268/0x3dc
[d8f35c40] [c04cf8a4] tcp_v4_do_rcv+0xa0/0x198
[d8f35c70] [c04cfde0] tcp_v4_rcv+0x444/0x6d4
[d8f35ca0] [c04b4530] ip_local_deliver+0x104/0x1d8
[d8f35cc0] [c04b43f4] ip_rcv+0x508/0x540
[d8f35cf0] [c04922a0] netif_receive_skb+0x390/0x3bc
[d8f35d20] [c0492368] process_backlog+0x9c/0x134
[d8f35d50] [c0492b68] net_rx_action+0x80/0x190
[d8f35d80] [c0028e8c] __do_softirq+0xa4/0x120
[d8f35dc0] [c0006a0c] do_softirq+0x40/0x58
[d8f35dd0] [c0028db4] local_bh_enable+0x7c/0x9c
[d8f35de0] [c04852ec] release_sock+0x94/0xa8
[d8f35e00] [c04dcd70] inet_stream_connect+0x224/0x29c
[d8f35e50] [c0482a78] sys_connect+0x78/0xa8
[d8f35f00] [c04840ac] sys_socketcall+0xf0/0x240
[d8f35f40] [c0013cf4] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xfd51434
LR = 0xff66550
Mem-Info:
DMA per-cpu:
CPU 0: hi: 186, btch: 31 usd: 9
active_anon:5179 inactive_anon:4611 isolated_anon:0
active_file:5690 inactive_file:104841 isolated_file:0
unevictable:0 dirty:26 writeback:0 unstable:0
free:2044 slab_reclaimable:2732 slab_unreclaimable:866
mapped:4325 shmem:161 pagetables:357 bounce:0
DMA free:8176kB min:2884kB low:3604kB high:4324kB active_anon:20716kB
inactive_anon:18444kB active_file:22760kB inactive_file:419364kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:520192kB
mlocked:0kB dirty:104kB writeback:0kB mapped:17300kB shmem:644kB
slab_reclaimable:10928kB slab_unreclaimable:3464kB kernel_stack:912kB
pagetables:1428kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 2044*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 8176kB
110693 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 779144kB
Total swap = 779144kB
131072 pages RAM
0 pages HighMem
3391 pages reserved
113688 pages shared
24092 pages non-shared
munin-update: page allocation failure. order:1, mode:0x20
Call Trace:
[d62efa30] [c0008a24] show_stack+0x4c/0x14c (unreliable)
[d62efa70] [c005817c] __alloc_pages_nodemask+0x468/0x4b8
[d62efb00] [c0077470] cache_alloc_refill+0x2d8/0x55c
[d62efb50] [c0077858] kmem_cache_alloc+0x64/0xb0
[d62efb70] [c0486320] sk_prot_alloc+0x2c/0x78
[d62efb90] [c0486404] sk_clone+0x20/0x1b0
[d62efbb0] [c04bca98] inet_csk_clone+0x1c/0x8c
[d62efbc0] [c04d1380] tcp_create_openreq_child+0x20/0x2c4
[d62efbe0] [c04d00c8] tcp_v4_syn_recv_sock+0x58/0x17c
[d62efc00] [c04d11ec] tcp_check_req+0x268/0x3dc
[d62efc40] [c04cf8a4] tcp_v4_do_rcv+0xa0/0x198
[d62efc70] [c04cfde0] tcp_v4_rcv+0x444/0x6d4
[d62efca0] [c04b4530] ip_local_deliver+0x104/0x1d8
[d62efcc0] [c04b43f4] ip_rcv+0x508/0x540
[d62efcf0] [c04922a0] netif_receive_skb+0x390/0x3bc
[d62efd20] [c0492368] process_backlog+0x9c/0x134
[d62efd50] [c0492b68] net_rx_action+0x80/0x190
[d62efd80] [c0028e8c] __do_softirq+0xa4/0x120
[d62efdc0] [c0006a0c] do_softirq+0x40/0x58
[d62efdd0] [c0028db4] local_bh_enable+0x7c/0x9c
[d62efde0] [c04852ec] release_sock+0x94/0xa8
[d62efe00] [c04dcd70] inet_stream_connect+0x224/0x29c
[d62efe50] [c0482a78] sys_connect+0x78/0xa8
[d62eff00] [c04840ac] sys_socketcall+0xf0/0x240
[d62eff40] [c0013cf4] ret_from_syscall+0x0/0x38
--- Exception: c01 at 0xfd51434
LR = 0xff66550
Mem-Info:
DMA per-cpu:
CPU 0: hi: 186, btch: 31 usd: 152
active_anon:5217 inactive_anon:5036 isolated_anon:0
active_file:7419 inactive_file:100929 isolated_file:0
unevictable:0 dirty:22 writeback:0 unstable:0
free:3562 slab_reclaimable:2788 slab_unreclaimable:873
mapped:4388 shmem:161 pagetables:358 bounce:0
DMA free:14248kB min:2884kB low:3604kB high:4324kB active_anon:20868kB
inactive_anon:20144kB active_file:29676kB inactive_file:403716kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:520192kB
mlocked:0kB dirty:88kB writeback:0kB mapped:17552kB shmem:644kB
slab_reclaimable:11152kB slab_unreclaimable:3492kB kernel_stack:896kB
pagetables:1432kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
DMA: 3562*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB
0*1024kB 0*2048kB 0*4096kB = 14248kB
108510 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 779144kB
Total swap = 779144kB
131072 pages RAM
0 pages HighMem
3391 pages reserved
110176 pages shared
26054 pages non-shared


Running as root:

navi:~# memtester 470 1
memtester version 4.0.8 (32-bit)
Copyright (C) 2007 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffff000
want 470MB (492830720 bytes)
got 470MB (492830720 bytes), trying mlock ...locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
^C (takes too much time)

Dmesg did not show anything new.

Running as root:

navi:~# memtester 490 1

Trying to mlock: I lost console access, I guess ssh server got oom killed.

Reconnecting. Machine responses very slowly, logon took some 10 seconds.

Dmesgs:

As user:
http://xnet.fi/opt/apps/lkml-2.6.32-vanilla.dmesg.memtester.after.nonroot.txt
As root memtester 490 1
http://xnet.fi/opt/apps/lkml-2.6.32-vanilla.dmesg.memtester.after.root.toomuch.txt


--Mikko Kortelainen

2009/12/10 Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>:
> On Wed, Dec 09, 2009 at 11:09:34PM +0200, kordex - wrote:
>> No, I have not, is there a way to do that while the system is running?
>
> This depends on the motherboard and firmware -- and I must unfortunately
> defer to others on this. ÂHowever, in my experience, you must usually
> do this from firmware prompt.
>
> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂThanx, Paul
>
>> 2009/12/9 Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>:
>> > On Wed, Dec 09, 2009 at 08:36:33PM +0200, kordex - wrote:
>> >> I will turn debugging options on after
>> >> http://lkml.org/lkml/2009/12/9/44 gets traced down so I can do that.
>> >
>> > Hmmm... ÂYou have recently run a memory test on your system, right?
>> >
>> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂThanx, Paul
>> >
>> >> --Mikko Kortelainen
>> >>
>> >> 2009/12/9 Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>:
>> >> > In that case, the best thing would be to drop the warning into the
>> >> > beginnings and ends of processing for system calls used by your workload.
>> >> > The hope would be to find it triggering at the end of a given syscall,
>> >> > permitting you to binary search the intervening code.
>> >> >
>> >> > Given that I cannot reproduce this, I cannot do much more than to offer
>> >> > random hints.
>> >> >
>> >> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂThanx, Paul
>> >> >
>> >> > On Wed, Dec 09, 2009 at 07:35:44PM +0200, kordex - wrote:
>> >> >> It did actually show the Badness after system had been running a long
>> >> >> time. And this cut from it shows that system was fully done kernel
>> >> >> init routines as there is ntpd running:
>> >> >>
>> >> >> warning: `ntpd' uses 32-bit capabilities (legacy support in use)
>> >> >> ------------[ cut here ]------------
>> >> >> Badness at kernel/rcutree.c:1228
>> >> >> NIP: c004ecbc LR: c004f14c CTR: c007bd70
>> >> >> REGS: df34dde0 TRAP: 0700 Â Not tainted Â(2.6.32)
>> >> >>
>> >> >> --Mikko Kortelainen
>> >> >>
>> >> >> 2009/12/9 Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>:
>> >> >> > Hmmm... ÂDidn't the first console output you sent me show the beenonline
>> >> >> > WARN_ON_ONCE() triggering during late boot? ÂYes, you had other failures
>> >> >> > later, but it might be that whatever is triggering this warning is
>> >> >> > related to those failures, right?
>> >> >> >
>> >> >> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂThanx, Paul
>> >> >> >
>> >> >> > On Wed, Dec 09, 2009 at 04:57:54PM +0200, kordex - wrote:
>> >> >> >> Sorry but,
>> >> >> >>
>> >> >> >> Where actually this "down nearer to the point in the boot-up sequence"
>> >> >> >> would be as I encountered the errors while the system was running (had
>> >> >> >> been for days).
>> >> >> >>
>> >> >> >> --Mikko Kortelainen
>> >> >> >>
>> >> >> >> 2009/12/9 Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>:
>> >> >> >> > On Wed, Dec 09, 2009 at 11:15:17AM +0200, kordex - wrote:
>> >> >> >> >> Hey,
>> >> >> >> >>
>> >> >> >> >> I hope it's in the right place.
>> >> >> >> >
>> >> >> >> > Looks fine to me.
>> >> >> >> >
>> >> >> >> > And the fact that you did -not- see anything in your dmesg indicates
>> >> >> >> > that the beenonline fields are set correctly at that point, as expected.
>> >> >> >> > You will only see a complaint if the beenonline fields have been
>> >> >> >> > corrupted.
>> >> >> >> >
>> >> >> >> > Please move them down nearer to the point in the boot-up sequence where
>> >> >> >> > you were seeing the failure. ÂPlease note that interrupts had been on
>> >> >> >> > for one good long time when your original kernel complained, so there
>> >> >> >> > had been a very large number of executions with beenonline set
>> >> >> >> > correctly.
>> >> >> >> >
>> >> >> >> > So it will probably be faster to start at the original failure
>> >> >> >> > and move towards boot rather than vice versa.
>> >> >> >> >
>> >> >> >> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂThanx, Paul
>> >> >> >> >
>> >> >> >> >> --Mikko Kortelainen
>> >> >> >> >>
>> >> >> >> >> navi:/usr/src# diff -Naur a/init/main.c b/init/main.c
>> >> >> >> >> --- a/init/main.c    2009-12-03 05:51:21.000000000 +0200
>> >> >> >> >> +++ b/init/main.c    2009-12-09 03:22:15.000000000 +0200
>> >> >> >> >> @@ -81,6 +81,9 @@
>> >> >> >> >> Â#include <asm/smp.h>
>> >> >> >> >> Â#endif
>> >> >> >> >>
>> >> >> >> >> +/* DEBUG STATEMENT 2009/12/08 */
>> >> >> >> >> +#include <linux/rcutree.h>
>> >> >> >> >> +
>> >> >> >> >> Âstatic int kernel_init(void *);
>> >> >> >> >>
>> >> >> >> >> Âextern void init_IRQ(void);
>> >> >> >> >> @@ -589,6 +592,10 @@
>> >> >> >> >> Â Â Â Â Â Â Â Â local_irq_disable();
>> >> >> >> >> Â Â Â Â }
>> >> >> >> >> Â Â Â Â rcu_init();
>> >> >> >> >> +
>> >> >> >> >> + Â Â Â /* DEBUG STATEMENT 2009/12/08 */
>> >> >> >> >> + Â Â Â WARN_ON_ONCE(rcu_check_beenonline());
>> >> >> >> >> +
>> >> >> >> >> Â Â Â Â /* init some links before init_ISA_irqs() */
>> >> >> >> >> Â Â Â Â early_irq_init();
>> >> >> >> >> Â Â Â Â init_IRQ();
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >>
>> >> >> >> >> 2009/12/9 Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>:
>> >> >> >> >> > On Wed, Dec 09, 2009 at 03:41:00AM +0200, kordex - wrote:
>> >> >> >> >> >> Hey,
>> >> >> >> >> >>
>> >> >> >> >> >> I put the debug function under init/main.c after rcu_init(); but there
>> >> >> >> >> >> is no output on dmesg which means that it receives zero value.
>> >> >> >> >> >>
>> >> >> >> >> >> Full dmesg: http://xnet.fi/opt/apps/lkml-2.6.32-vanilla.dmesg.rcu-init.txt
>> >> >> >> >> >
>> >> >> >> >> > Could you please send the patch you applied to, as you said, put the
>> >> >> >> >> > debug function under init/main.c after rcu_init()?
>> >> >> >> >> >
>> >> >> >> >> > Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂThanx, Paul
>> >> >> >> >> >
>> >> >> >> >> >> --Mikko Kortelainen
>> >> >> >> >> >>
>> >> >> >> >> >> 2009/12/8 Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>:
>> >> >> >> >> >> > On Tue, Dec 08, 2009 at 11:22:07AM -0800, Paul E. McKenney wrote:
>> >> >> >> >> >> >> At this point, I must defer to those more skilled than I at diagnosing
>> >> >> >> >> >> >> early-boot problems.
>> >> >> >> >> >> >
>> >> >> >> >> >> > Well, that is silly on my part -- the problem seems to appear late in
>> >> >> >> >> >> > boot, and you had no problem capturing that portion of the boot log.
>> >> >> >> >> >> >
>> >> >> >> >> >> > So please see below for a patch providing a rcu_check_beenonline()
>> >> >> >> >> >> > function that, when called after rcu_init(), returns non-zero if the
>> >> >> >> >> >> > beenonline fields have become corrupted. ÂSo put calls of the form:
>> >> >> >> >> >> >
>> >> >> >> >> >> > Â Â Â ÂWARN_ON_ONCE(rcu_check_beenonline());
>> >> >> >> >> >> >
>> >> >> >> >> >> > in the initialization code path preceding the problem. ÂEither #include
>> >> >> >> >> >> > rcupdate.h or explicitly declare the function as appropriate.
>> >> >> >> >> >> >
>> >> >> >> >> >> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
>> >> >> >> >> >> > ---
>> >> >> >> >> >> >
>> >> >> >> >> >> > diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
>> >> >> >> >> >> > index 9642c6b..190a687 100644
>> >> >> >> >> >> > --- a/include/linux/rcutree.h
>> >> >> >> >> >> > +++ b/include/linux/rcutree.h
>> >> >> >> >> >> > @@ -39,6 +39,8 @@ extern int rcu_cpu_notify(struct notifier_block *self,
>> >> >> >> >> >> > Âextern int rcu_needs_cpu(int cpu);
>> >> >> >> >> >> > Âextern int rcu_expedited_torture_stats(char *page);
>> >> >> >> >> >> >
>> >> >> >> >> >> > +extern int rcu_check_beenonline(void);
>> >> >> >> >> >> > +
>> >> >> >> >> >> > Â#ifdef CONFIG_TREE_PREEMPT_RCU
>> >> >> >> >> >> >
>> >> >> >> >> >> > Âextern void __rcu_read_lock(void);
>> >> >> >> >> >> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
>> >> >> >> >> >> > index 207125b..27d3722 100644
>> >> >> >> >> >> > --- a/kernel/rcutree.c
>> >> >> >> >> >> > +++ b/kernel/rcutree.c
>> >> >> >> >> >> > @@ -77,6 +77,17 @@ DEFINE_PER_CPU(struct rcu_data, rcu_sched_data);
>> >> >> >> >> >> > Âstruct rcu_state rcu_bh_state = RCU_STATE_INITIALIZER(rcu_bh_state);
>> >> >> >> >> >> > ÂDEFINE_PER_CPU(struct rcu_data, rcu_bh_data);
>> >> >> >> >> >> >
>> >> >> >> >> >> > +/*
>> >> >> >> >> >> > + * Ad-hoc diagnostic function, for use only after rcu_init() has
>> >> >> >> >> >> > + * returned. ÂAssumes that the boot CPU is CPU 0. ÂAssumes that
>> >> >> >> >> >> > + * the kernel has been built with CONFIG_TREE_RCU. ÂNot for inclusion.
>> >> >> >> >> >> > + * Usage: "WARN_ON_ONCE(rcu_check_beenonline());"
>> >> >> >> >> >> > + */
>> >> >> >> >> >> > +int rcu_check_beenonline(void)
>> >> >> >> >> >> > +{
>> >> >> >> >> >> > + Â Â Â return !per_cpu(rcu_sched_data, 0).beenonline ||
>> >> >> >> >> >> > + Â Â Â Â Â Â Â!per_cpu(rcu_bh_data, 0).beenonline;
>> >> >> >> >> >> > +}
>> >> >> >> >> >> >
>> >> >> >> >> >> > Â/*
>> >> >> >> >> >> > Â* Return true if an RCU grace period is in progress. ÂThe ACCESS_ONCE()s
>> >> >> >> >> >> >
>> >> >> >> >> >
>> >> >> >> >
>> >> >> >
>> >> >
>> >
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/