Re: [x86] Kernel panic - not syncing: Fatal exception in interrupt

From: H. Peter Anvin
Date: Sun Jul 21 2013 - 13:21:52 EST


Honestly, why don't we make the patch list (rbtree, whatever) a permanent part of the default breakpoint handler. It only applies to kernel space anyway and the kernel doesn't have any permanent breakpoints so there should be no performance reason not to.

Jiri Kosina <jkosina@xxxxxxx> wrote:
>On Sat, 20 Jul 2013, H. Peter Anvin wrote:
>
>> > [ 0.212429] devtmpfs: initialized
>> > [ 0.236027] int3: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>> > [ 0.237157] Modules linked in:
>> > [ 0.237765] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
>3.11.0-rc1-01429-g04bf576 #8
>> > [ 0.239129] task: ffff88000da1b040 ti: ffff88000da1c000 task.ti:
>ffff88000da1c000
>> > [ 0.240000] RIP: 0010:[<ffffffff811098cc>] [<ffffffff811098cc>]
>ttwu_do_wakeup+0x28/0x225
>> > [ 0.240000] RSP: 0000:ffff88000dd03f10 EFLAGS: 00000006
>> > [ 0.240000] RAX: 0000000000000000 RBX: ffff88000dd12940 RCX:
>ffffffff81769c40
>> > [ 0.240000] RDX: 0000000000000002 RSI: 0000000000000000 RDI:
>0000000000000001
>> > [ 0.240000] RBP: ffff88000dd03f28 R08: ffffffff8176a8c0 R09:
>0000000000000002
>> > [ 0.240000] R10: ffffffff810ff484 R11: ffff88000dd129e8 R12:
>ffff88000dbc90c0
>> > [ 0.240000] R13: ffff88000dbc90c0 R14: ffff88000da1dfd8 R15:
>ffff88000da1dfd8
>> > [ 0.240000] FS: 0000000000000000(0000)
>GS:ffff88000dd00000(0000) knlGS:0000000000000000
>> > [ 0.240000] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>> > [ 0.240000] CR2: 00000000ffffffff CR3: 0000000001c88000 CR4:
>00000000000006e0
>> > [ 0.240000] Stack:
>> > [ 0.240000] ffff88000dd12940 ffff88000dbc90c0 ffff88000da1dfd8
>ffff88000dd03f48
>> > [ 0.240000] ffffffff81109e2b ffff88000dd12940 0000000000000000
>ffff88000dd03f68
>> > [ 0.240000] ffffffff81109e9e 0000000000000000 0000000000012940
>ffff88000dd03f98
>> > [ 0.240000] Call Trace:
>> > [ 0.240000] <IRQ>
>> > [ 0.240000] [<ffffffff81109e2b>]
>ttwu_do_activate.constprop.56+0x6d/0x79
>> > [ 0.240000] [<ffffffff81109e9e>] sched_ttwu_pending+0x67/0x84
>> > [ 0.240000] [<ffffffff8110c845>] scheduler_ipi+0x15a/0x2b0
>> > [ 0.240000] [<ffffffff8104dfb4>]
>smp_reschedule_interrupt+0x38/0x41
>> > [ 0.240000] [<ffffffff8173bf5d>] reschedule_interrupt+0x6d/0x80
>> > [ 0.240000] <EOI>
>> > [ 0.240000] [<ffffffff810ff484>] ?
>__atomic_notifier_call_chain+0x5/0xc1
>> > [ 0.240000] [<ffffffff8105cc30>] ? native_safe_halt+0xd/0x16
>>
>> Well, it is definitely easy to see what happened here.
>>
>> We took a breakpoint fault that the kernel didn't expect. This
>> shouldn't happen... the breakpoint handler should have said "oh, this
>is
>> an instruction being patched" and resumed, but that didn't happen.
>>
>> Jiri, I'm wondering if by any chance we have more than one CPU inside
>> text_poke_bp() at the same time. The global variables in
>text_poke_bp()
>> don't seem to be protected against reentrancy at all.
>
>That shouldn't happen, because:
>
>
>- text_poke_bp() should always be called under text_mutex (and
> arch_jump_label_transform() does that properly)
>
>- correctness between int3_notify() and texp_poke_bp() wrt. global
> variables is achieved through barrier
>
>So we should be safe here afaics.
>
>What I am however wondering whether can't be case here is that the jump
>
>label was used before int3_notifier has been registered.
>I am thinking about ways around this, but we'll probably have to do the
>
>same ftrace is doing, i.e. hook into do_int3() directly instead of
>relying
>on the notifier to be registered in time.
>
>Fengguang, as I am not able to reproduce this bug locally, could you do
>me
>a favor and test whether the patch below works the problem around, just
>
>for the sake of testing the hypothesis?
>
>Thanks.
>
>
>From: Jiri Kosina <jkosina@xxxxxxx>
>Subject: [PATCH] x86: call out into int3 handler directly instead of
>using notifier
>
>---
> arch/x86/include/asm/alternative.h | 2 ++
> arch/x86/kernel/alternative.c | 22 +++++++++++++++++++++-
> arch/x86/kernel/traps.c | 4 ++++
> 3 files changed, 27 insertions(+), 1 deletions(-)
>
>diff --git a/arch/x86/include/asm/alternative.h
>b/arch/x86/include/asm/alternative.h
>index 3abf8dd..c22a41d 100644
>--- a/arch/x86/include/asm/alternative.h
>+++ b/arch/x86/include/asm/alternative.h
>@@ -5,6 +5,7 @@
> #include <linux/stddef.h>
> #include <linux/stringify.h>
> #include <asm/asm.h>
>+#include <asm/ptrace.h>
>
> /*
> * Alternative inline assembly for SMP.
>@@ -232,6 +233,7 @@ struct text_poke_param {
> size_t len;
> };
>
>+extern int poke_bp_int3_handler(struct pt_regs *regs);
> extern void *text_poke(void *addr, const void *opcode, size_t len);
>extern void *text_poke_bp(void *addr, const void *opcode, size_t len,
>void *handler);
>extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
>diff --git a/arch/x86/kernel/alternative.c
>b/arch/x86/kernel/alternative.c
>index 0ab4936..e1088f2 100644
>--- a/arch/x86/kernel/alternative.c
>+++ b/arch/x86/kernel/alternative.c
>@@ -605,6 +605,24 @@ static void do_sync_core(void *info)
> static bool bp_patching_in_progress;
> static void *bp_int3_handler, *bp_int3_addr;
>
>+int poke_bp_int3_handler(struct pt_regs *regs)
>+{
>+ /* bp_patching_in_progress */
>+ smp_rmb();
>+
>+ if (likely(!bp_patching_in_progress))
>+ return 0;
>+
>+ if (user_mode_vm(regs) || regs->ip != (unsigned long)bp_int3_addr)
>+ return 0;
>+
>+ /* set up the specified breakpoint handler */
>+ regs->ip = (unsigned long) bp_int3_handler;
>+
>+ return 1;
>+
>+}
>+
>static int int3_notify(struct notifier_block *self, unsigned long val,
>void *data)
> {
> struct die_args *args = data;
>@@ -689,6 +707,7 @@ void *text_poke_bp(void *addr, const void *opcode,
>size_t len, void *handler)
> return addr;
> }
>
>+#if 0
> /* this one needs to run before anything else handles it as a
> * regular exception */
> static struct notifier_block int3_nb = {
>@@ -700,8 +719,9 @@ static int __init int3_init(void)
> {
> return register_die_notifier(&int3_nb);
> }
>-
> arch_initcall(int3_init);
>+#endif
>+
> /*
> * Cross-modifying kernel text with stop_machine().
> * This code originally comes from immediate value.
>diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
>index 772e2a8..e464764 100644
>--- a/arch/x86/kernel/traps.c
>+++ b/arch/x86/kernel/traps.c
>@@ -58,6 +58,7 @@
> #include <asm/mce.h>
> #include <asm/fixmap.h>
> #include <asm/mach_traps.h>
>+#include <asm/alternative.h>
>
> #ifdef CONFIG_X86_64
> #include <asm/x86_init.h>
>@@ -324,6 +325,9 @@ dotraplinkage void __kprobes notrace do_int3(struct
>pt_regs *regs, long error_co
> ftrace_int3_handler(regs))
> return;
> #endif
>+ if (poke_bp_int3_handler(regs))
>+ return;
>+
> prev_state = exception_enter();
> #ifdef CONFIG_KGDB_LOW_LEVEL_TRAP
> if (kgdb_ll_trap(DIE_INT3, "int3", regs, error_code, X86_TRAP_BP,

--
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/