Re: linux-3.7.1: OOPS in page_lock_anon_vma
From: Hugh Dickins
Date: Sun Jan 06 2013 - 19:20:53 EST
On Sun, 6 Jan 2013, Martin Mokrejs wrote:
> I was running 3.7.1 kernel quite fine for a while but I realized that it is slow and that
> I should go and drop useless kernel drivers from my kernel. I have a SandyBridge-based
> laptop and I found that I gain speed while setting CONFIG_NO_HZ=y, CONFIG_PREEMPT_NONE=y,
> removing multicore scheduler, asking configurator set set maximum amount of CPUs for my
> system (and not blindly specifying 4 for my dual-core i7 processor).
> Further I get faster system while removing IOMMU and DMA redirects while it still
> emulates NUMA. And, I switched away from CFQ scheduler to deadline and from SLAB to SLUB.
> Finally, to make sure my CPU cores do not go back and forth between C0 and C7 states and
> shutdown dynamically the 2 hyperthreaded cores. So I have really only two, physical cores
> accessible. With performance CPU governor I have 1/2 of context switches and both cores
> can be satured by whatever jobs (kernel compile or some computational jobs). It was not
> possible to get the CPU running at turbo speed for a long while as it always went down
> time to time. With ondemand governor I had cores in C7 for 50-70% of the time, that was
> a bit better with performance governor but having the two hyperthreaded cores disabled
> reduced the context switches by half, rescheduling interrupts went down by several orders
> of magnitute. So it is crunching at max turbo speed on both cores, temp about 80 oC.
>
> I think none of the changes relates to the kernel crash directly but I had not a single crash
> with 3.7.1 for few weeks. After the tweaks I had 3-4 crashes this afternoon. The system always
> locked up so I could not see anything. Luckily, be it actually the same crash or not, now my X11
> screen was dropped and to my framebuffer console and I got to see a kernel stacktrace. Here
> is the first, fished out from /var/log/messages upon next bootup:
>
>
> Jan 6 22:37:29 vostro kernel: [ 7663.251110] general protection fault: 0000 [#1] SMP
> Jan 6 22:37:29 vostro kernel: [ 7663.251135] Modules linked in: i915 fbcon bitblit cfbfillrect softcursor cfbimgblt i2c_algo_bit font cfbcopyarea drm_kms_helper drm fb iwldvm iwlwifi fbdev sata_sil24
> Jan 6 22:37:29 vostro kernel: [ 7663.251197] CPU 1
> Jan 6 22:37:29 vostro kernel: [ 7663.251206] Pid: 795, comm: kswapd0 Not tainted 3.7.1-default #22 Dell Inc. Vostro 3550/
> Jan 6 22:37:29 vostro kernel: [ 7663.251229] RIP: 0010:[<ffffffff815d3dee>] [<ffffffff815d3dee>] mutex_trylock+0xb/0x26
> Jan 6 22:37:29 vostro kernel: [ 7663.251257] RSP: 0018:ffff88040d25bbb8 EFLAGS: 00010246
> Jan 6 22:37:29 vostro kernel: [ 7663.251273] RAX: 0000000000000001 RBX: ffff88040bfdc000 RCX: ffff88040d25bce8
> Jan 6 22:37:29 vostro kernel: [ 7663.251293] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0720072007200728
> Jan 6 22:37:29 vostro kernel: [ 7663.251313] RBP: ffff88040d25bbb8 R08: dead000000200200 R09: dead000000100100
> Jan 6 22:37:29 vostro kernel: [ 7663.251333] R10: ffff88040d25bc38 R11: ffff8804078acec0 R12: ffff88040bfdc001
> Jan 6 22:37:29 vostro kernel: [ 7663.251354] R13: ffffea0010137440 R14: 0720072007200728 R15: 0000000000000001
> Jan 6 22:37:29 vostro kernel: [ 7663.251374] FS: 0000000000000000(0000) GS:ffff88041fa80000(0000) knlGS:0000000000000000
> Jan 6 22:37:29 vostro kernel: [ 7663.251396] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jan 6 22:37:29 vostro kernel: [ 7663.251413] CR2: 00002b876c545978 CR3: 00000000018f6000 CR4: 00000000000407e0
> Jan 6 22:37:29 vostro kernel: [ 7663.251432] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Jan 6 22:37:29 vostro kernel: [ 7663.251452] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Jan 6 22:37:29 vostro kernel: [ 7663.251472] Process kswapd0 (pid: 795, threadinfo ffff88040d25a000, task ffff88040d07ce30)
> Jan 6 22:37:29 vostro kernel: [ 7663.251494] Stack:
> Jan 6 22:37:29 vostro kernel: [ 7663.251501] ffff88040d25bbe8 ffffffff810f6994 ffffea0010137440 0000000000000000
> Jan 6 22:37:29 vostro kernel: [ 7663.251527] ffff88040d25bde8 ffff88041fddad00 ffff88040d25bc58 ffffffff810f6b9e
> Jan 6 22:37:29 vostro kernel: [ 7663.251551] 0000000000000000 ffff8804046d2dc0 00000000810dee97 ffff88040d25bce8
> Jan 6 22:37:29 vostro kernel: [ 7663.251576] Call Trace:
> Jan 6 22:37:29 vostro kernel: [ 7663.251587] [<ffffffff810f6994>] page_lock_anon_vma+0x40/0xaf
> Jan 6 22:37:29 vostro kernel: [ 7663.251605] [<ffffffff810f6b9e>] page_referenced+0x78/0x1b7
> Jan 6 22:37:29 vostro kernel: [ 7663.251623] [<ffffffff810e026a>] shrink_active_list+0x209/0x305
> Jan 6 22:37:29 vostro kernel: [ 7663.251641] [<ffffffff810e1269>] kswapd+0x3fe/0x8ea
> Jan 6 22:37:29 vostro kernel: [ 7663.251658] [<ffffffff81091697>] ? wake_up_bit+0x25/0x25
> Jan 6 22:37:29 vostro kernel: [ 7663.251675] [<ffffffff810e0e6b>] ? try_to_free_pages+0x8c/0x8c
> Jan 6 22:37:29 vostro kernel: [ 7663.251692] [<ffffffff81091120>] kthread+0x90/0x98
> Jan 6 22:37:29 vostro kernel: [ 7663.251707] [<ffffffff81091090>] ? kthread_freezable_should_stop+0x3c/0x3c
> Jan 6 22:37:29 vostro kernel: [ 7663.251727] [<ffffffff815d5dec>] ret_from_fork+0x7c/0xb0
> Jan 6 22:37:29 vostro kernel: [ 7663.251743] [<ffffffff81091090>] ? kthread_freezable_should_stop+0x3c/0x3c
> Jan 6 22:37:29 vostro kernel: [ 7663.251762] Code: 8d 53 08 c7 03 01 00 00 00 48 39 d0 74 09 48 8b 78 10 e8 a0 79 ac ff 66 83 43 04 01 5a 5b c9 c3 55 b8 01 00 00 00 48 89 e5 31 d2 <f0> 0f b1 17 ff c8 75 0f 65 48 8b 04 25 00 b8 00 00 b2 01 48 89
> Jan 6 22:37:29 vostro kernel: [ 7663.251898] RIP [<ffffffff815d3dee>] mutex_trylock+0xb/0x26
> Jan 6 22:37:29 vostro kernel: [ 7663.251916] RSP <ffff88040d25bbb8>
> Jan 6 22:37:29 vostro kernel: [ 7663.471083] ---[ end trace 15db67145b2c838a ]---
> Jan 6 22:37:39 vostro kernel: [ 7672.954999] SysRq : Emergency Sync
>
>
>
> It seemed the kernel was still running, disk was doing some work and CPU fan was changing its speed.
> I then pressed alt+sysrq+i and got (retyped from a camera picture which is attached as this one was
> not in /var/log/messages):
>
> lock_anon_vma_root.clone
> unlink_anon_vmas
> free_pgtables
> exit_mmap
> mmput
> exit_mm
> do_exit
> ? recalc_sigpending_tsk
> do_group_exit
> get_signal_to_deliver
> do_signal
> ? timespec_add_safe
> ? __fput
> do_notify_resume
> int_signal
>
> But the system was dead, I had to turn off the power.
>
>
> Any clues? What kernel .config item should I enable/disable to avoid it in the future? ;-)
> Thank you,
> Martin
One of your struct anon_vmas seems to have been overwritten with 0x0720s.
I've no idea why. But since you mention you've put SLUB in, best to take
advantage of it by rebooting with slub_debug=AFPZ and see if that shows
up anything interesting.
Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/