oops in x86/oprofile/dump_stack with 3.4.6

From: wyang1
Date: Thu Aug 02 2012 - 03:05:44 EST


Hi all,

A couple of days ago I tried to use oprofile with enabling call graph in a recent build of 3.4.6. this causes a OOPS
"BUG: unable to handle kernel paging request at 636f7270".

The oops can be often reproduced by the following steps on my board based on Intel Atom.

opcontrol --no-vmlinux
opcontrol -c 10
opcontrol -e BR_INST_RETIRED:8000
opcontrl -s

With a few seconds I get the oops.

I've spent some time investigating the problem, and found it is possible that the marked line in dump_strace function results in the oops as the content of stack
variable is less than CONFIG_PAGE_OFFSET(0xc000_0000).
arch/x86/kernel/dumpstack_32.c
int dump_stack()
...
for (;;) {
struct thread_info *context;
context = (struct thread_info *)
((unsigned long)stack & (~(THREAD_SIZE - 1)));
bp = ops->walk_stack(context, stack, bp, ops, data, NULL, &graph);

-----> stack = (unsigned long *)context->previous_esp;
if (!stack)
break;
if (ops->stack(data, "IRQ") < 0)
break;
touch_nmi_watchdog();
}
...
Actually I think this scenario should not happen, because the stack should be kernel stack or irq stack.
So I changed this function by adding simplistic check to avoid this scenario. It seems to work for my board.

for (;;) {
struct thread_info *context;
+ /*
+ * Somehow stack may be less than CONFIG_PAGE_OFFSET,
+ * So we should temporarily avoid the scenario before
+ * figuring out the root cause.
+ */
+ if (stack < CONFIG_PAGE_OFFSET)
+ break;

context = (struct thread_info *)
((unsigned long)stack & (~(THREAD_SIZE - 1)));


Any suggestion for the change.

Thanks
Wei

The following is the oops I encountered.
BUG: unable to handle kernel paging request at 636f7270
IP: [<c1005229>] print_context_stack+0x69/0x130
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP
LTT NESTING LEVEL : 0
Modules linked in:

Pid: 0, comm: swapper/0 Not tainted 3.4.6-WR5.0+snapshot-20120801_standard #9 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
EIP: 0060:[<c1005229>] EFLAGS: 00010046 CPU: 0
EIP is at print_context_stack+0x69/0x130
EAX: 0000002e EBX: 636f7270 ECX: 00000000 EDX: 04010000
ESI: ffffe000 EDI: 00000000 EBP: f680be98 ESP: f680be68
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: 636f7270 CR3: 0192e000 CR4: 000007d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process swapper/0 (pid: 0, ti=f680a000 task=c183cfa0 task.ti=c1832000)
Stack:
c1761c2c 636f7270 636f6000 00000000 636f7ffc 636f6000 c1833ed4 ffffe000
c1833ed4 636f7270 c18888dc 636f6000 f680bec8 c1004218 c18888dc f680bedc
00000000 f680beb4 c1833ed4 00000000 f680bec0 c1833ed4 f680bfc4 0000000a
Call Trace:
[<c1004218>] dump_trace+0x68/0xf0
[<c1554682>] x86_backtrace+0xb2/0xc0
[<c1552682>] oprofile_add_sample+0xa2/0xc0
[<c10040df>] ? do_softirq+0x6f/0xa0
[<c15562c9>] ppro_check_ctrs+0x79/0x100
[<c1556250>] ? ppro_shutdown+0x60/0x60
[<c15550af>] profile_exceptions_notify+0x8f/0xb0
[<c1660d78>] nmi_handle.isra.0+0x48/0x70
[<c1660e9f>] do_nmi+0xff/0x570
[<c105bd75>] ? run_rebalance_domains+0x155/0x180
[<c105428b>] ? get_parent_ip+0xb/0x40
[<c102e379>] ? __local_bh_enable+0x29/0x70
[<c102ecd0>] ? ftrace_define_fields_irq_handler_entry+0x80/0x80
[<c16601c9>] nmi_stack_correct+0x28/0x2d
[<c102ecd0>] ? ftrace_define_fields_irq_handler_entry+0x80/0x80
[<c10040df>] ? do_softirq+0x6f/0xa0
<IRQ>
[<c102f155>] irq_exit+0x65/0x70
[<c1666441>] do_IRQ+0x51/0xc0
[<c1666369>] common_interrupt+0x29/0x30
[<c102007b>] ? amd_get_subcaches+0x4b/0x90
[<c1330416>] ? intel_idle+0xc6/0x120
[<c14e4719>] cpuidle_enter+0x19/0x30
[<c14e4cf0>] cpuidle_idle_call+0xa0/0x320
[<c1009e8a>] cpu_idle+0x5a/0xc0
[<c163fa48>] rest_init+0x6c/0x74
[<c189d703>] start_kernel+0x2fe/0x305
[<c189d23d>] ? repair_env_string+0x51/0x51
[<c189d078>] i386_start_kernel+0x78/0x7d
Code: 00 00 3b 5d dc 72 13 8b 45 f0 8d 64 24 24 5b 5e 5f 5d c3 8d b4 26 00 00 00 00 3b 5d ec 72 e8 81 fb ff ff ff bf 0f 86 a3 00 00 00 <8b> 33 89 f0 e8 ae e5 03 00 85 c0 74 1e 8b 45 f0 83 c0 04 39 c3
EIP: [<c1005229>] print_context_stack+0x69/0x130 SS:ESP 0068:f680be68
CR2: 00000000636f7270
---[ end trace d4af25ee5ff6fd8c ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/