Re: -next March 3: Boot failure on x86 (Oops)

From: Tejun Heo
Date: Fri Mar 05 2010 - 08:25:11 EST


Hello,

On 03/05/2010 07:44 PM, Sachin Sant wrote:
> Tejun Heo wrote:
>> On 03/05/2010 03:08 PM, Tejun Heo wrote:
>>
>>> Hmmm... this means that on one of the chunks, chunk->list.next was
>>> NULL (BTW, the disassembly is from unlinked object, right?). The main
>>> allocation code hasn't seen much change lately. The only changes are,
>>>
>>> 22b737f4c75197372d64afc6ed1bccd58c00e549 : just refactoring
>>> 833af8427be4b217b5bc522f61afdbd3f1d282c2 : possible but isn't very new
>>>
>>
>> Can you also please try reverting the above two commits?
>>
>> Thanks.
>>
>>
> Reverting both the commits allows the machine to boot.
> If i just apply 22b737f4c75197372d64afc6ed1bccd58c00e549 the
> box fails to boot with following kobject related traces:
>
> registered taskstats version 1
> kobject '' (c11d5fdc): tried to add an uninitialized object, something
> is seriously wrong.
> Pid: 1, comm: swapper Not tainted 2.6.33-autotest-next-20100305 #3
> Call Trace:
> [<c03a7678>] ? printk+0xf/0x17
> [<c028766f>] kobject_add+0x28/0x49
> [<c05a1a8e>] memmap_init+0x4f/0x89
> [<c05a1a3f>] ? memmap_init+0x0/0x89
> [<c0101139>] do_one_initcall+0x4c/0x131
> [<c057b352>] kernel_init+0x127/0x1a8
> [<c057b22b>] ? kernel_init+0x0/0x1a8
> [<c0102db6>] kernel_thread_helper+0x6/0x10
> ------------[ cut here ]------------
> WARNING: at lib/kobject.c:595 kobject_put+0x27/0x3c()
> Hardware name: eserver xSeries 235 -[86717AX]-
> kobject: '' (c11d5fdc): is not initialized, yet kobject_put() is being
> called.
> Modules linked in:
> Pid: 1, comm: swapper Not tainted 2.6.33-autotest-next-20100305 #3
> Call Trace:
> [<c012fe66>] warn_slowpath_common+0x60/0x90
> [<c012feca>] warn_slowpath_fmt+0x24/0x27
> [<c02872c6>] kobject_put+0x27/0x3c
> [<c05a1a9c>] memmap_init+0x5d/0x89
> [<c05a1a3f>] ? memmap_init+0x0/0x89
> [<c0101139>] do_one_initcall+0x4c/0x131
> [<c057b352>] kernel_init+0x127/0x1a8
> [<c057b22b>] ? kernel_init+0x0/0x1a8
> [<c0102db6>] kernel_thread_helper+0x6/0x10
> ---[ end trace 7b6574301a0037c2 ]---
>
> The results are with today's next, but i think same applies to Linus
> tree as well.

I'm having very difficult time imagining how 22b737f4 could have
affected this as the patch is identical transformation of the previous
code. Also, 833af842 was released with 2.6.32 and stayed that way, so
it really looks like a memory overrun / random corruption thing. Can
you please retry with kmalloc debug stuff turned on?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/