Re: 4.14.9 with CONFIG_MCORE2 fails to boot

From: Alexander Tsoy
Date: Fri Dec 29 2017 - 18:18:16 EST


Ð ÐÑ, 29/12/2017 Ð 14:09 -0800, Linus Torvalds ÐÐÑÐÑ:
>
...
> The fact that double faults seem to be implicated does make me want
> to
> try to disable that ESPFIX64 code in the #DF handler.
>
> What happens if you take a failing kernel, and then in
> arch/x86/kernel/traps.c do_double_fault(), you change the
>
> Â #ifdef CONFIG_X86_ESPFIX64
>
> to just a
>
> Â #if 0
>
> do you then get an actual double-fault oops report instead of the
> stall (and NMI oops)?

This is what I get after disablingÂESPFIX64 (see attachment).[ 0.000000] ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block: 128/64 (20170831/tbfadt-603)
[ 0.000000] ACPI BIOS Warning (bug): Incorrect checksum in table [TCPA] - 0x00, should be 0x7F (20170831/tbprint-211)
[ 0.499855] Expanded resource Reserved due to conflict with PCI Bus 0000:00
[ 0.506002] Expanded resource Reserved due to conflict with PCI Bus 0000:00
[ 21.777011] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 21.778008] 0-...!: (0 ticks this GP) idle=b0a/140000000000000/0 softirq=158/158 fqs=0
[ 21.778008] (detected by 1, t=21002 jiffies, g=-254, c=-255, q=4)
[ 0.776477] NMI backtrace for cpu 0
[ 0.776477] CPU: 0 PID: 114 Comm: modprobe Not tainted 4.15.0-rc5+ #6
[ 0.776477] Hardware name: Dell Inc. OptiPlex 760 /0M858N, BIOS A16 08/06/2013
[ 0.776477] RIP: 0010:paranoid_entry+0x0/0x70
[ 0.776477] RSP: 0000:fffffe8000007f50 EFLAGS: 00000083
[ 0.776477] RAX: 00000000b7c00000 RBX: 0000000000000001 RCX: 00000000c0000101
[ 0.776477] RDX: 00000000ffff951a RSI: 0000000000000000 RDI: fffffe8000007f58
[ 0.776477] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 0.776477] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa28b5b36
[ 0.776477] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 0.776477] FS: 0000000000000000(0000) GS:ffff951ab7c00000(0000) knlGS:0000000000000000
[ 0.776477] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.776477] CR2: fffffe8000006f08 CR3: 000000022d232000 CR4: 00000000000406f0
[ 0.776477] Call Trace:
[ 0.776477] <#DF>
[ 0.776477] double_fault+0xc/0x30
[ 0.776477] RIP: 0010:do_double_fault+0xb/0xb0
[ 0.776477] RSP: 0000:fffffe8000006f18 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
[ 0.776477] RAX: 00000000b7c00000 RBX: 0000000000000001 RCX: 00000000c0000101
[ 0.776477] RDX: 00000000ffff951a RSI: 0000000000000000 RDI: fffffe8000007f58
[ 0.776477] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 0.776477] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa28b5b36
[ 0.776477] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 0.776477] ? page_fault+0x36/0x60
[ 0.776477] </#DF>
[ 0.776477] Code: 00 00 00 48 89 e7 31 f6 ff 15 45 02 57 00 e9 88 00 00 00 e8 93 00 00 00 48 89 e7 31 f6 ff 15 30 02 57 00 e9 43 01 00 00 0f 1f 00 <fc> 4c 89 5c 24 38 4c 89 54 24 40 4c 89 4c 24 48 4c 89 44 24 50
[ 21.778008] rcu_preempt kthread starved for 21002 jiffies! g18446744073709551362 c18446744073709551361 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=0
[ 21.778008] Call Trace:
[ 21.778008] ? __schedule+0x37f/0x7b0
[ 21.778008] ? preempt_count_add+0x64/0xa0
[ 21.778008] schedule+0x4a/0xa0
[ 21.778008] schedule_timeout+0x179/0x380
[ 21.778008] ? __next_timer_interrupt+0xd0/0xd0
[ 21.778008] rcu_gp_kthread+0x96b/0x1050
[ 21.778008] ? calc_global_load_tick+0x61/0x70
[ 21.778008] kthread+0xff/0x130
[ 21.778008] ? force_qs_rnp+0x1d0/0x1d0
[ 21.778008] ? kthread_create_worker_on_cpu+0x70/0x70
[ 21.778008] ret_from_fork+0x1f/0x30
[ 84.782011] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 84.783008] 0-...0: (0 ticks this GP) idle=b0a/140000000000000/0 softirq=158/158 fqs=15691
[ 84.783008] (detected by 1, t=84007 jiffies, g=-254, c=-255, q=4)
[ 0.776477] NMI backtrace for cpu 0
[ 0.776477] CPU: 0 PID: 114 Comm: modprobe Not tainted 4.15.0-rc5+ #6
[ 0.776477] Hardware name: Dell Inc. OptiPlex 760 /0M858N, BIOS A16 08/06/2013
[ 0.776477] RIP: 0010:double_fault+0x0/0x30
[ 0.776477] RSP: 0000:fffffe8000007fd0 EFLAGS: 00000086
[ 0.776477] RAX: 00000000b7c00000 RBX: 0000000000000001 RCX: 00000000c0000101
[ 0.776477] RDX: 00000000ffff951a RSI: 0000000000000000 RDI: fffffe8000007f58
[ 0.776477] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 0.776477] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa28b5b36
[ 0.776477] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 0.776477] FS: 0000000000000000(0000) GS:ffff951ab7c00000(0000) knlGS:0000000000000000
[ 0.776477] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.776477] CR2: fffffe8000006f08 CR3: 000000022d232000 CR4: 00000000000406f0
[ 0.776477] Call Trace:
[ 0.776477] <#DF>
[ 0.776477] do_double_fault+0xb/0xb0
[ 0.776477] </#DF>
[ 0.776477] Code: 05 00 00 48 89 e7 31 f6 e8 ae 5f 56 ff e9 19 06 00 00 e8 54 05 00 00 48 89 e7 31 f6 e8 9a 5f 56 ff e9 05 06 00 00 0f 1f 44 00 00 <66> 66 90 48 83 c4 88 e8 b4 04 00 00 48 89 e7 48 8b 74 24 78 48
[ 147.787011] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 147.788008] 0-...0: (0 ticks this GP) idle=b0a/140000000000000/0 softirq=158/158 fqs=31437
[ 147.788008] (detected by 1, t=147012 jiffies, g=-254, c=-255, q=4)