Re: 4.14.9 with CONFIG_MCORE2 fails to boot

From: Alexander Tsoy
Date: Fri Dec 29 2017 - 09:44:20 EST


Ð ÐÑ, 29/12/2017 Ð 17:31 +0300, Alexander Tsoy ÐÐÑÐÑ:
> Ð ÐÑ, 29/12/2017 Ð 10:17 +0100, Greg KH ÐÐÑÐÑ:
> > On Thu, Dec 28, 2017 at 12:33:22PM +0300, Alexander Tsoy wrote:
> > > Hello,
> > >
> > > 4.14.9 fails to boot if CONFIG_MCORE2 is enabled and when
> > > compiled
> > > with
> > > gcc 6+. More details in the following bug reports:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=198263
> > > https://bugs.gentoo.org/642268
> > >
> > > I bisected it to the commit below:
> > >
> > > $ git bisect good
> > > 2bc9fa0beaf10206a778f02e9e5cb62f50345b1a is the first bad commit
> > > commit 2bc9fa0beaf10206a778f02e9e5cb62f50345b1a
> > > Author: Andy Lutomirski <luto@xxxxxxxxxx>
> > > Date:ÂÂÂMon Dec 4 15:07:23 2017 +0100
> > >
> > > ÂÂÂÂx86/entry/64: Use a per-CPU trampoline stack for IDT entries
> > >
> > > ÂÂÂÂcommit 7f2590a110b837af5679d08fc25c6227c5a8c497 upstream.
> > >
> > > ÂÂÂÂHistorically, IDT entries from usermode have always gone
> > > directly
> > > ÂÂÂÂto the running task's kernel stack.ÂÂRearrange it so that we
> > > enter
> > > on
> > > ÂÂÂÂa per-CPU trampoline stack and then manually switch to the
> > > task's
> > > stack.
> > > ÂÂÂÂThis touches a couple of extra cachelines, but it gives us a
> > > chance
> > > ÂÂÂÂto run some code before we touch the kernel stack.
> > >
> > > ÂÂÂÂThe asm isn't exactly beautiful, but I think that fully
> > > refactoring
> > > ÂÂÂÂit can wait.
> > >
> > > ÂÂÂÂSigned-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
> > > ÂÂÂÂSigned-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > > ÂÂÂÂReviewed-by: Borislav Petkov <bp@xxxxxxx>
> > > ÂÂÂÂReviewed-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> > > ÂÂÂÂCc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> > > ÂÂÂÂCc: Borislav Petkov <bp@xxxxxxxxx>
> > > ÂÂÂÂCc: Borislav Petkov <bpetkov@xxxxxxx>
> > > ÂÂÂÂCc: Brian Gerst <brgerst@xxxxxxxxx>
> > > ÂÂÂÂCc: Dave Hansen <dave.hansen@xxxxxxxxx>
> > > ÂÂÂÂCc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> > > ÂÂÂÂCc: David Laight <David.Laight@xxxxxxxxxx>
> > > ÂÂÂÂCc: Denys Vlasenko <dvlasenk@xxxxxxxxxx>
> > > ÂÂÂÂCc: Eduardo Valentin <eduval@xxxxxxxxxx>
> > > ÂÂÂÂCc: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>
> > > ÂÂÂÂCc: H. Peter Anvin <hpa@xxxxxxxxx>
> > > ÂÂÂÂCc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
> > > ÂÂÂÂCc: Juergen Gross <jgross@xxxxxxxx>
> > > ÂÂÂÂCc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> > > ÂÂÂÂCc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > > ÂÂÂÂCc: Rik van Riel <riel@xxxxxxxxxx>
> > > ÂÂÂÂCc: Will Deacon <will.deacon@xxxxxxx>
> > > ÂÂÂÂCc: aliguori@xxxxxxxxxx
> > > ÂÂÂÂCc: daniel.gruss@xxxxxxxxxxxxxx
> > > ÂÂÂÂCc: hughd@xxxxxxxxxx
> > > ÂÂÂÂCc: keescook@xxxxxxxxxx
> > > ÂÂÂÂLink: https://lkml.kernel.org/r/20171204150606.225330557@linu
> > > tr
> > > onix
> > > .de
> > > ÂÂÂÂSigned-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
> > > ÂÂÂÂSigned-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx
> > > >
> > >
> > > :040000 040000 275d4746936a9e521a2b5041856f7dc1d1820dc6
> > > 8f8e869fd59c3dd781dceffa76e53e41d733a0cf MÂÂÂÂÂÂarch
> > >
> > > $ git bisect log
> > > git bisect start
> > > # bad: [dad5c1402c570cd07a80113784bc20a7f930c8ae] Linux 4.14.9
> > > git bisect bad dad5c1402c570cd07a80113784bc20a7f930c8ae
> > > # good: [7b3775017f4e6b87dfd2c7f63d1eaf057948f31d] Linux 4.14.8
> > > git bisect good 7b3775017f4e6b87dfd2c7f63d1eaf057948f31d
> > > # good: [d120cd749ef9770ee98b708a83b49547dcf1c0e1] x86/entry/64:
> > > Separate cpu_current_top_of_stack from TSS.sp0
> > > git bisect good d120cd749ef9770ee98b708a83b49547dcf1c0e1
> > > # bad: [97f41b41c432e5a80c91445d92c2f4b729984d36] powerpc/xmon:
> > > Avoid
> > > tripping SMP hardlockup watchdog
> > > git bisect bad 97f41b41c432e5a80c91445d92c2f4b729984d36
> > > # bad: [bfd66a406fe7e590055c1d6714adc697f18664c8] PCI: Avoid bus
> > > reset
> > > if bridge itself is broken
> > > git bisect bad bfd66a406fe7e590055c1d6714adc697f18664c8
> > > # bad: [8388d287e361a2fd0a39bece30a736d692d5c3d8]
> > > x86/cpufeatures:
> > > Make
> > > CPU bugs sticky
> > > git bisect bad 8388d287e361a2fd0a39bece30a736d692d5c3d8
> > > # bad: [bb568391775d4a840992e2d2493f39d6e86401e3] x86/entry/64:
> > > Move
> > > the IST stacks into struct cpu_entry_area
> > > git bisect bad bb568391775d4a840992e2d2493f39d6e86401e3
> > > # bad: [2bc9fa0beaf10206a778f02e9e5cb62f50345b1a] x86/entry/64:
> > > Use
> > > a
> > > per-CPU trampoline stack for IDT entries
> > > git bisect bad 2bc9fa0beaf10206a778f02e9e5cb62f50345b1a
> > > # good: [c3dbef1bd0f7eb09daf49409ea533aa1b0eeb82e] x86/espfix/64:
> > > Stop
> > > assuming that pt_regs is on the entry stack
> > > git bisect good c3dbef1bd0f7eb09daf49409ea533aa1b0eeb82e
> > > # first bad commit: [2bc9fa0beaf10206a778f02e9e5cb62f50345b1a]
> > > x86/entry/64: Use a per-CPU trampoline stack for IDT entries
> >
> > Thanks for letting us know.ÂÂDoes Linus's current tree also have
> > this
> > same problem for you?
>
> Just tested Linus's master branch and it have the same problem. All I
> can catch with a serial console is the following:
>
> [ÂÂÂÂ0.000000] ACPI BIOS Warning[ÂÂÂÂ0.498898] Expanded resource
> conflict with PCI Bus 0000:00

Ooops. This one is correct:

[ÂÂÂÂ0.000000] ACPI BIOS Warning (bug): 32/64X length mismatch in
FADT/Gpe0Block: 128/64 (20170831/tbfadt-603)
[ÂÂÂÂ0.000000] ACPI BIOS Warning (bug): Incorrect checksum in table
[TCPA] - 0x00, should be 0x7F (20170x31/tbprint-211)
[ÂÂÂÂ0.499627] Expanded resource Reserved due to conflict with PCI Bus
0000:00
[ÂÂÂÂ0.506002] Expanded resource Reserved due to conflict with PCI Bus
0000:00
[ÂÂÂ21.776011] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ÂÂÂ21.w77008]ÂÂ0-...!: (0 ticks this GP) idle=c56/140000000000000/0
softirq=73/73 fqs=0Â
[ÂÂÂ21.777008]ÂÂ(detected by 1, t=21002 jiffies, g=-255, c=-256, q=4)
[ÂÂÂÂ0.775461] NMI backtrace for cpu 0
[ÂÂÂÂ0.775461] CPU: 0 PID: 114 Comm: modprobe Not tainted 4.1u.0-rc5+
#1
[ÂÂÂÂ0.775461] Hardware name: Dell Inc. OptiPlex
760ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ/0M858N, BIOS A16 08/06/2013
[ÂÂÂÂ0.775461] RIP: 0010:paranoid_entry+0x58/0x70
[ÂÂÂÂ0.775461] RSP: 0000:fffffe8000007f50 EFLAGS: 00000083
[ÂÂÂÂ0.775461] RAX: 0000000077c00p00 RBX: 0000000000000001 RCX:
00000000c0000101
[ÂÂÂÂ0.775461] RDX: 00000000ffffa035 RSI: 0000000000000000 RDI:
fffffe8000007f5x
[ÂÂÂÂ0.775461] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[ÂÂÂÂ0.775461] R10: 0000000000000000 R11: 0p00000000000000 R12:
ffffffffaecb5b36
[ÂÂÂÂ0.775461] R13: 0000000000000000 R14: 0000000000000000 R15:
0000000000000000
[ÂÂÂÂ0.w75461] FS:ÂÂ0000000000000000(0000) GS:ffffa03577c00000(0000)
knlGS:0000000000000000
[ÂÂÂÂ0.775461] CS:ÂÂ0010 DS: 0000 ES: 0000`CR0: 0000000080050033
[ÂÂÂÂ0.775461] CR2: fffffe8000006f08 CR3: 000000022952c000 CR4:
00000000000406f0
[ÂÂÂÂ0.775461] Call Trace:
[ÂÂÂÂ0.775461]ÂÂ<#DF>
[ÂÂÂÂ0.775461]ÂÂ? double_fault+0xc/0x30
[ÂÂÂÂ0.775461]ÂÂ? page_fault+0x36/0x60
[ÂÂÂÂ0.775461]ÂÂdo_double_fault+0xb/0x130
[ÂÂÂÂ0.775461]ÂÂ</#DF>
[ÂÂÂÂ0.775461] Code: 78 4c 89 7c 24 08 4c 89 74 24 10 4c 89 6c 24 18 4c
89 64 2t 20 48 89 6c 24 28 48 89 5c 24 30 bb 01 00 00 00 b9 01 01 00 c0
0f 32 <85> d2 78 05 0f 01 f8 31 db c3 0f 1f 40 00 66 2e 0f 1f 8t 00 00Â
[ÂÂÂ21.777008] rcu_preempt kthread starved for 21002 jiffies!
g18446744073709551361 c18446744073709551360 f0x0 RCU_GP_WAIT_FQS(3)
->state=0x402 ->cpu=0
[ÂÂÂ21.777008] Call Trace:
[ÂÂÂ21.777008]ÂÂ? __schedule+0x37f/0x7b0
[ÂÂÂ21.777008]ÂÂ? preempt_count_add+0x64/0xa0
[ÂÂÂ21.777008]ÂÂschedule+0x4a/0xa0
[ÂÂÂ21.777008]ÂÂschedule_timeout+0x179/0x380
[ÂÂÂ21.777008]ÂÂ? __next_timer_interrupt+0xd0/0xd0
[ÂÂÂ21.777008]ÂÂrcu_gp_kthread+0x96b/0x1050
[ÂÂÂ21.777008]ÂÂ? calc_global_load_tick+0x61/0x70
[ ` 21.777008]ÂÂkthread+0xff/0x130
[ÂÂÂ21.777008]ÂÂ? force_qs_rnp+0x1d0/0x1d0
[ÂÂÂ21.777008]ÂÂ? kthread_create_worker_on_cpu+0x7p/0x70
[ÂÂÂ21.777008]ÂÂret_from_fork+0x1f/0x30