Re: early exception error

From: Robert Hancock
Date: Fri Jan 02 2009 - 15:57:45 EST


Cyrill Gorcunov wrote:
[david@xxxxxxx - Fri, Jan 02, 2009 at 10:21:52AM -0800]
On Wed, 31 Dec 2008, david@xxxxxxx wrote:

On Thu, 1 Jan 2009, Andi Kleen wrote:

On Wed, Dec 31, 2008 at 12:59:08PM -0800, david@xxxxxxx wrote:
On Wed, 31 Dec 2008, Andi Kleen wrote:

on the picture you sent me i noticed the message
"Your memory is not aligned you need to rebuild your
kernel with bigger NODEMAP SIZE shift=20" and then
srat code complains about "No NUMA code hash function found"
which looks a bit scary. Btw, could you post this picture
on some public resource so NUMA people could check it?
This case used to be handled cleanly (NUMA disabled), but perhaps
that has regressed. But still it sounds like something is going wrong,
unless his machine really has a very weird memory map.
it shouldn't, it was one of the high-volume servers 4-5 years ago and only
has 4G of ram in it
From looking at the screenshot Cyrill sent you seem to have a funny
SRAT with overlapping areas that is rejected in the end. I suspect the
fallback code doesn't handle this properly.

Does it work when you boot with numa=noacpi ?
it gets past the point where the bootmemory_debug messages flow by, but I get another oops (snapshot of the screen is at http://linux.lang.hm/linux/IMG00031.jpg )
oops, I misread your mail, IMG00031.jpg was with numa=off

I just posted IMG00033.jpg which is with numa=noacpi and earlyprintk=vga but not bootmem_debug

David Lang


Thanks, David! Trying to understand what is going on :)

Here is a new picture if someone would like to jump into
the bug handling

http://linux.lang.hm/linux/IMG00033.jpg

alloc_bootmem_core is a reasonably big function, it would be useful if we could track down what line it's blowing up on.. Can you try to find out what line that fault address (ffffffff8096452a in this crash) is on as described in Documentation/BUG-HUNTING, i.e. build with CONFIG_DEBUG_INFO enabled, run gdb on vmlinux and do:

l *0xffffffff8096452a

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/