Re: 2.6.34-rc4 : OOPS in unmap_vma

From: Vivek Goyal
Date: Wed Apr 14 2010 - 12:08:12 EST


On Wed, Apr 14, 2010 at 05:22:31PM +0200, Borislav Petkov wrote:
> From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> Date: Wed, Apr 14, 2010 at 07:32:08AM -0700
>
> Hi Linus,
>
> > On Wed, 14 Apr 2010, Borislav Petkov wrote:
> > >
> > > hmm, it doesn't look like it. Your code translates to something like
> > >
> > > 0: b8 00 00 00 00 mov $0x0,%eax
> > > 5: 80 ff ff cmp $0xff,%bh
> > > 8: ff 48 21 decl 0x21(%rax)
> > > b: 45 80 48 8b 45 rex.RB orb $0x45,-0x75(%r8)
> > > 10: 80 48 ff c8 orb $0xc8,-0x1(%rax)
> >
> > There's a large constant (0xffffff8000000000) in there at the beginning,
> > and the disassembly hasn't found the start of the next instruction very
> > cleanly. The same is true at the end: another large constant is cut off in
> > the middle.
> >
> > The byte just before the dumped instruction stream is almost certainly
> > '48h', and the last byte of the last constant is 0xff, and the disassembly
> > ends up being:
> >
> > 0: 48 b8 00 00 00 00 80 mov $0xffffff8000000000,%rax
> > 7: ff ff ff
> > a: 48 21 45 80 and %rax,-0x80(%rbp)
> > e: 48 8b 45 80 mov -0x80(%rbp),%rax
> > 12: 48 ff c8 dec %rax
> > 15: 48 3b 85 40 ff ff ff cmp -0xc0(%rbp),%rax
> > 1c: 48 8b 85 50 ff ff ff mov -0xb0(%rbp),%rax
> > 23: 48 0f 42 7d 80 cmovb -0x80(%rbp),%rdi
> > 28: 48 89 7d 80 mov %rdi,-0x80(%rbp)
> > 2c:* 48 8b 38 mov (%rax),%rdi <-- trapping instruction
> > 2f: 48 85 ff test %rdi,%rdi
> > 32: 0f 84 f5 04 00 00 je 0x52d
> > 38: 48 b8 fb 0f 00 00 00 mov $0xffffc00000000ffb,%rax
> > 3f: c0 ff ff
> >
> > But yes, you found the right spot (that 0xffffff8000000000 constant is
> > -549755813888 decimal):
>
> Right, the decodecode output looked kinda strange to me and I tried
> to match the instruction order and find the location. But yeah, now
> that I'm looking at show_registers(), we don't start dumping on precise
> instruction boundary but simply 64 bytes in the default case. No time
> for an instruction decoder along that path :).
>
> > > which I could correlate with what I get here (comments added):
> >
> > Yup. Close enough. Btw, it's often good to look at both the *.s code _and_
> > the *.lst code. If you do "make mm/memory.lst", you'll find those big
> > constants easily, and then you'll see the code this way:
>
> [..]
>
> ok, I can't say that I'm a linux newbie but the .lst code is new to me.
> Damn, and I thought I knew it all :)
>
> > > so it looks like it tries to find a page table rooted at that address
> > > but the pointer value of 0000000000002203 is bogus.
> >
> > Yes, it does look like some strange page table corruption, doesn't look
> > anon_vma related at all. It's intriguing that it started happening now,
> > though, so..
>
> Well, Parag said something about kexec kernel so it is definitely
> interesting what he means there - a kexec-enabled kernel or is this the
> "second" kernel his machine kexec'd into after a previous failure. I
> think this could clarify the situation a bit.

FWIW, Just a data point. I pulled in latest kernel and I can boot it
through BIOS as well as kexec boot on my x86_64 box.

Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/