Re: [x86] Kernel panic - not syncing: Fatal exception in interrupt

From: Jiri Kosina
Date: Sun Jul 21 2013 - 04:31:35 EST


On Sat, 20 Jul 2013, H. Peter Anvin wrote:

> > [ 0.212429] devtmpfs: initialized
> > [ 0.236027] int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> > [ 0.237157] Modules linked in:
> > [ 0.237765] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.11.0-rc1-01429-g04bf576 #8
> > [ 0.239129] task: ffff88000da1b040 ti: ffff88000da1c000 task.ti: ffff88000da1c000
> > [ 0.240000] RIP: 0010:[<ffffffff811098cc>] [<ffffffff811098cc>] ttwu_do_wakeup+0x28/0x225
> > [ 0.240000] RSP: 0000:ffff88000dd03f10 EFLAGS: 00000006
> > [ 0.240000] RAX: 0000000000000000 RBX: ffff88000dd12940 RCX: ffffffff81769c40
> > [ 0.240000] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000001
> > [ 0.240000] RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09: 0000000000000002
> > [ 0.240000] R10: ffffffff810ff484 R11: ffff88000dd129e8 R12: ffff88000dbc90c0
> > [ 0.240000] R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15: ffff88000da1dfd8
> > [ 0.240000] FS: 0000000000000000(0000) GS:ffff88000dd00000(0000) knlGS:0000000000000000
> > [ 0.240000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [ 0.240000] CR2: 00000000ffffffff CR3: 0000000001c88000 CR4: 00000000000006e0
> > [ 0.240000] Stack:
> > [ 0.240000] ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8 ffff88000dd03f48
> > [ 0.240000] ffffffff81109e2b ffff88000dd12940 0000000000000000 ffff88000dd03f68
> > [ 0.240000] ffffffff81109e9e 0000000000000000 0000000000012940 ffff88000dd03f98
> > [ 0.240000] Call Trace:
> > [ 0.240000] <IRQ>
> > [ 0.240000] [<ffffffff81109e2b>] ttwu_do_activate.constprop.56+0x6d/0x79
> > [ 0.240000] [<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84
> > [ 0.240000] [<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0
> > [ 0.240000] [<ffffffff8104dfb4>] smp_reschedule_interrupt+0x38/0x41
> > [ 0.240000] [<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80
> > [ 0.240000] <EOI>
> > [ 0.240000] [<ffffffff810ff484>] ? __atomic_notifier_call_chain+0x5/0xc1
> > [ 0.240000] [<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16
>
> Well, it is definitely easy to see what happened here.
>
> We took a breakpoint fault that the kernel didn't expect. This
> shouldn't happen... the breakpoint handler should have said "oh, this is
> an instruction being patched" and resumed, but that didn't happen.
>
> Jiri, I'm wondering if by any chance we have more than one CPU inside
> text_poke_bp() at the same time. The global variables in text_poke_bp()
> don't seem to be protected against reentrancy at all.

That shouldn't happen, because:


- text_poke_bp() should always be called under text_mutex (and
arch_jump_label_transform() does that properly)

- correctness between int3_notify() and texp_poke_bp() wrt. global
variables is achieved through barrier

So we should be safe here afaics.

What I am however wondering whether can't be case here is that the jump
label was used before int3_notifier has been registered.
I am thinking about ways around this, but we'll probably have to do the
same ftrace is doing, i.e. hook into do_int3() directly instead of relying
on the notifier to be registered in time.

Fengguang, as I am not able to reproduce this bug locally, could you do me
a favor and test whether the patch below works the problem around, just
for the sake of testing the hypothesis?

Thanks.


From: Jiri Kosina <jkosina@xxxxxxx>
Subject: [PATCH] x86: call out into int3 handler directly instead of using notifier

---
arch/x86/include/asm/alternative.h | 2 ++
arch/x86/kernel/alternative.c | 22 +++++++++++++++++++++-
arch/x86/kernel/traps.c | 4 ++++
3 files changed, 27 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 3abf8dd..c22a41d 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -5,6 +5,7 @@
#include <linux/stddef.h>
#include <linux/stringify.h>
#include <asm/asm.h>
+#include <asm/ptrace.h>

/*
* Alternative inline assembly for SMP.
@@ -232,6 +233,7 @@ struct text_poke_param {
size_t len;
};

+extern int poke_bp_int3_handler(struct pt_regs *regs);
extern void *text_poke(void *addr, const void *opcode, size_t len);
extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 0ab4936..e1088f2 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -605,6 +605,24 @@ static void do_sync_core(void *info)
static bool bp_patching_in_progress;
static void *bp_int3_handler, *bp_int3_addr;

+int poke_bp_int3_handler(struct pt_regs *regs)
+{
+ /* bp_patching_in_progress */
+ smp_rmb();
+
+ if (likely(!bp_patching_in_progress))
+ return 0;
+
+ if (user_mode_vm(regs) || regs->ip != (unsigned long)bp_int3_addr)
+ return 0;
+
+ /* set up the specified breakpoint handler */
+ regs->ip = (unsigned long) bp_int3_handler;
+
+ return 1;
+
+}
+
static int int3_notify(struct notifier_block *self, unsigned long val, void *data)
{
struct die_args *args = data;
@@ -689,6 +707,7 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
return addr;
}

+#if 0
/* this one needs to run before anything else handles it as a
* regular exception */
static struct notifier_block int3_nb = {
@@ -700,8 +719,9 @@ static int __init int3_init(void)
{
return register_die_notifier(&int3_nb);
}
-
arch_initcall(int3_init);
+#endif
+
/*
* Cross-modifying kernel text with stop_machine().
* This code originally comes from immediate value.
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 772e2a8..e464764 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -58,6 +58,7 @@
#include <asm/mce.h>
#include <asm/fixmap.h>
#include <asm/mach_traps.h>
+#include <asm/alternative.h>

#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -324,6 +325,9 @@ dotraplinkage void __kprobes notrace do_int3(struct pt_regs *regs, long error_co
ftrace_int3_handler(regs))
return;
#endif
+ if (poke_bp_int3_handler(regs))
+ return;
+
prev_state = exception_enter();
#ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,

--
Jiri Kosina
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/