Re: v4.10: kernel stack frame pointer .. has bad value (null)

From: Pavel Machek
Date: Mon Mar 06 2017 - 11:47:55 EST


On Thu 2017-03-02 17:45:14, Josh Poimboeuf wrote:
> On Fri, Feb 24, 2017 at 11:04:39PM -0600, Josh Poimboeuf wrote:
> > On Thu, Feb 23, 2017 at 09:10:39PM +0100, Pavel Machek wrote:
> > > Hi!
> > >
> > >
> > > > > > > Somehow, startup_32_smp() is on the stack twice. The stack unwind led
> > > > > > > to the startup_32_smp() frame at 0xf50cdf9c rather than the one at
> > > > > > > 0xf50cdfa8 (which is where it should normally be). So the question is
> > > > > > > how startup_32_smp() got executed the second time, with the wrong stack
> > > > > > > offset.
> > > > > >
> > > > > > Not much idea... but this is stack dump, right? Just because some
> > > > > > value is on the stack does not mean it is a return address, no?
> > > > >
> > > > > Right, but the one at 0xf50cdfa8 is where the startup_32_smp() is
> > > > > *supposed* to be. If the unwinder had unwinded to that one, it wouldn't
> > > > > have complained. So it looks to me like the CPU somehow booted twice:
> > > > > the first time at the right stack address, and the second time it
> > > > > somehow ended up with a different stack address.
> > > > >
> > > > > > And .... startup_32_smp is kind of "interesting" function. Take a
> > > > > > look...
> > > > >
> > > > > Yes, it's used in bringing up the CPU.
> > > >
> > > > Can you share your .config?
> > >
> > > Here you go...
> >
> > What version of gcc are you using?
> >
> > Can you post a disassembly of the first 10 instructions of
> > start_secondary()?
>
> Pavel, ping? I'd like to try to get to the bottom of this issue soon.
>
> I asked for the gcc version and the disassembly of start_secondary()
> because I suspect gcc may have done a funky stack alignment prologue
> which copies the return address on the stack a second time after
> aligning it.

Sorry for the delay. This is on v4.11-rc1, but that should be similar.

pavel@duo:~$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2

And here's the disassemble:

c402d200 <start_secondary>:
c402d200: 57 push %edi
c402d201: 8d 7c 24 08 lea 0x8(%esp),%edi
c402d205: 83 e4 f8 and $0xfffffff8,%esp
c402d208: ff 77 fc pushl -0x4(%edi)
c402d20b: 55 push %ebp
c402d20c: 89 e5 mov %esp,%ebp
c402d20e: 57 push %edi
c402d20f: 56 push %esi
c402d210: 83 ec 10 sub $0x10,%esp
c402d213: e8 78 78 ff ff call c4024a90 <cpu_init>
c402d218: ff 15 d0 d7 0c c5 call *0xc50cd7d0
c402d21e: 8b 15 00 53 05 c5 mov 0xc5055300,%edx
c402d224: 8d 75 e8 lea -0x18(%ebp),%esi
c402d227: 64 a1 f4 c0 1d c5 mov %fs:0xc51dc0f4,%eax
c402d22d: 89 45 e8 mov %eax,-0x18(%ebp)
c402d230: b8 20 00 00 00 mov $0x20,%eax
c402d235: ff 52 78 call *0x78(%edx)
c402d238: 8b 15 00 53 05 c5 mov 0xc5055300,%edx
c402d23e: ff 52 4c call *0x4c(%edx)
c402d241: e8 ea 2c 00 00 call c402ff30
<apic_ap_setup>
c402d246: 8b 45 e8 mov -0x18(%ebp),%eax
c402d249: e8 42 fb ff ff call c402cd90
<smp_store_cpu_info>
c402d24e: e8 5d 37 fd ff call c40009b0
<calibrate_delay>
c402d253: 8b 55 e8 mov -0x18(%ebp),%edx
c402d256: b8 00 c0 1d c5 mov $0xc51dc000,%eax
c402d25b: 8b 0d 88 d6 0b c5 mov 0xc50bd688,%ecx
c402d261: f6 05 fa fc 13 c5 04 testb $0x4,0xc513fcfa
c402d268: 8b 14 95 20 52 05 c5 mov
-0x3afaade0(,%edx,4),%edx
c402d26f: 89 8c 10 c4 00 00 00 mov %ecx,0xc4(%eax,%edx,1)
c402d276: 0f 85 24 01 00 00 jne c402d3a0
<start_secondary+0x1a0>
c402d27c: 64 a1 f4 c0 1d c5 mov %fs:0xc51dc0f4,%eax
c402d282: e8 49 fb ff ff call c402cdd0
<set_cpu_sibling_map>

Let me know if I should go back to v4.10 and retry.

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Attachment: signature.asc
Description: Digital signature