Re: early exception error

From: Cyrill Gorcunov
Date: Wed Dec 31 2008 - 14:12:29 EST


[david@xxxxxxx - Wed, Dec 31, 2008 at 12:07:33PM -0800]
> On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:
>
>> [david@xxxxxxx - Wed, Dec 31, 2008 at 11:12:12AM -0800]
>>> On Wed, 31 Dec 2008, Cyrill Gorcunov wrote:
>>>
>>>> [david@xxxxxxx - Tue, Dec 30, 2008 at 05:39:29PM -0800]
>>>>> On Wed, 31 Dec 2008, Andi Kleen wrote:
>>>>>
>>>>>> david@xxxxxxx writes:
>>>>>>>
>>>>>>> doing a grep through System.map for the address that appears in the
>>>>>>> error returns nothing
>>>>>>
>>>>>> This might be obvious, but you can't grep directly for these addresses
>>>>>> because System.map contains the starting addresses of functions only
>>>>>> and normally the reported address is somewhere in the middle of a
>>>>>> function. So you instead have to look for the highest number lower or equal
>>>>>> the address from the exception.
>>>>>
>>>>> thanks, this was not obvious to me
>>>>>
>>>>> the -2 error maps to
>>>>>
>>>>> ffffffff8099e4c1 T free_bootmem_node
>>>>> ffffffff8099e4e5 t alloc_bootmem_core
>>>>> ffffffff8099e774 t ___alloc_bootmem_nopanic
>>>>>
>>>>>
>>>>> the first error maps to
>>>>>
>>>>> ffffffff809c2de4 T free_bootmem_node
>>>>> ffffffff809c2e08 t alloc_bootmem_core
>>>>> ffffffff809c3097 t ___alloc_bootmem_nopanic
>>>>>
>>>>>
>>>>> so it looks like this is in alloc_bootmem_core in both cases.
>>>>>
>>>>> David Lang
>>>>>
>>>>
>>>> Along with Andi's proposed earlyprintk=vga I think
>>>> bootmem_debug option could be usefull here too.
>>>
>>> adding bootmem_debug creates so much additonal output that the oops
>>> scrolls off the screen (except the last 'paragraph' of it)
>>>
>>> it looks like it's individual items being allocated (trying to scan it as
>>> it scrolled by)
>>
>> on the picture you sent me i noticed the message
>> "Your memory is not aligned you need to rebuild your
>> kernel with bigger NODEMAP SIZE shift=20" and then
>> srat code complains about "No NUMA code hash function found"
>> which looks a bit scary. Btw, could you post this picture
>> on some public resource so NUMA people could check it?
>
> http://linux.lang.hm/linux/IMG00030.jpg
>
> I'll try rebuilding with a bigger nodemap size and let you know
>
> David Lang
>

also you could just pass numa=off and check if it help.
(even if it help it would not mean that problem are gone
but become hidden)

- Cyrill -
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/