Re: WARNING in do_debug

From: David Hildenbrand
Date: Tue Nov 07 2017 - 12:44:42 EST


On 31.10.2017 12:47, Dmitry Vyukov wrote:
> On Tue, Oct 31, 2017 at 2:34 PM, syzbot
> <bot+adbefe6736a5b37af36f19ebfa8764fcdd9ddaed@xxxxxxxxxxxxxxxxxxxxxxxxx>
> wrote:
>> Hello,
>>
>> syzkaller hit the following crash on
>> 0787643a5f6aad1f0cdeb305f7fe492b71943ea4
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/master
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached
>> Raw console output is attached.
>>
>> syzkaller reproducer is attached. See https://goo.gl/kgGztJ
>> for information about syzkaller reproducers
>>
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 3045 at arch/x86/kernel/traps.c:776
>> cond_local_irq_disable arch/x86/kernel/traps.c:85 [inline]
>> WARNING: CPU: 0 PID: 3045 at arch/x86/kernel/traps.c:776
>> do_debug+0x4d8/0x6e0 arch/x86/kernel/traps.c:790
>> Kernel panic - not syncing: panic_on_warn set ...
>>
>> CPU: 0 PID: 3045 Comm: syz-executor6 Not tainted 4.14.0-rc5+ #142
>> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> Google 01/01/2011
>> Call Trace:
>> <#DB>
>> __dump_stack lib/dump_stack.c:16 [inline]
>> dump_stack+0x194/0x257 lib/dump_stack.c:52
>> panic+0x1e4/0x417 kernel/panic.c:181
>> __warn+0x1c4/0x1d9 kernel/panic.c:542
>> report_bug+0x211/0x2d0 lib/bug.c:183
>> fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:178
>> do_trap_no_signal arch/x86/kernel/traps.c:212 [inline]
>> do_trap+0x260/0x390 arch/x86/kernel/traps.c:261
>> do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:298
>> do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:311
>> invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:905
>> RIP: 0010:cond_local_irq_disable arch/x86/kernel/traps.c:85 [inline]
>> RIP: 0010:do_debug+0x4d8/0x6e0 arch/x86/kernel/traps.c:790
>> RSP: 0018:ffff8801db20fe98 EFLAGS: 00010246
>> RAX: dffffc0000000000 RBX: ffff8801db20ff58 RCX: 0000000000000000
>> RDX: 1ffff1003b641ffc RSI: 0000000000000001 RDI: ffffffff85ac6398
>> RBP: ffff8801db20ff48 R08: ffff8801db20ffe8 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000004001
>> R13: ffff8801cd8541c0 R14: 1ffff1003b641fd8 R15: 0000000000004000
>> debug+0x34/0x70 arch/x86/entry/entry_64.S:1056
>> RIP: 0010:copy_user_enhanced_fast_string+0xe/0x20
>> arch/x86/lib/copy_user_64.S:180
>> RSP: 0018:ffff8801cd2cfe68 EFLAGS: 00010246
>> RAX: ffffed0039a59fe1 RBX: 0000000020000000 RCX: 000000000000003f
>> RDX: 0000000000000040 RSI: 0000000020000001 RDI: ffff8801cd2cfec9
>> RBP: ffff8801cd2cfe98 R08: ffffed0039a59fe1 R09: ffffed0039a59fe1
>> R10: 0000000000000008 R11: ffffed0039a59fe0 R12: 0000000000000040
>> R13: ffff8801cd2cfec8 R14: 00007ffffffff000 R15: 0000000020000040
>> </#DB>
>> copy_from_user include/linux/uaccess.h:146 [inline]
>> SYSC_timer_create kernel/time/posix-timers.c:579 [inline]
>> SyS_timer_create+0x89/0x120 kernel/time/posix-timers.c:572
>> entry_SYSCALL_64_fastpath+0x1f/0xbe
>> RIP: 0033:0x452719
>> RSP: 002b:00007f906f324be8 EFLAGS: 00000212 ORIG_RAX: 00000000000000de
>> RAX: ffffffffffffffda RBX: 0000000000758020 RCX: 0000000000452719
>> RDX: 0000000020000000 RSI: 0000000020000000 RDI: ffffffffffffffff
>> RBP: 0000000000000082 R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000000 R11: 0000000000000212 R12: 00000000006f3cf8
>> R13: 00000000ffffffff R14: 00007f906f3256d4 R15: 0000000000000000
>> Dumping ftrace buffer:
>> (ftrace buffer empty)
>> Kernel Offset: disabled
>> Rebooting in 86400 seconds..
>
>
> I think this is kvm bug, so +kvm maintainers.
>
> Unfortunately, this does not reproduce with a C program. But I was
> able to easily reproduce it with the provided syzkaller program by
> running:
> ./syz-execprog repro.txt
>
> On upstream 15f859ae5c43c7f0a064ed92d33f7a5bc5de6de0 (Oct 26).
> Seems that guest somehow sets debug register contents for host:

The BUG is triggered due to dr6 being set to DR_STEP.

In kvm, we only restore dr6 (via hw_breakpoint_restore()) in case hw
breakpoints are active (hw_breakpoint_active()).

However I am getting the feeling that we should restore dr6
unconditionally to current->thread.debugreg6 (as it doesn't seem to be
related to hw breakpoints only).

The question would then be, when we have to restore it (maybe its
already too late at that point?).

(no expert on x86 debug regs (yet)).

--

Thanks,

David / dhildenb