Re: console handover badness

From: David Miller
Date: Wed Aug 13 2008 - 04:51:22 EST


From: David Miller <davem@xxxxxxxxxxxxx>
Date: Tue, 12 Aug 2008 18:40:52 -0700 (PDT)

> From: Mikulas Patocka <mpatocka@xxxxxxxxxx>
> Date: Tue, 12 Aug 2008 21:11:53 -0400 (EDT)
>
> > and then boot failure of 2.6.27-rc[12] because of bad memory
> > migratetype. Is this migratetype crash a known problem? --- the problem is
> > that starting with 2.6.27rc1, I'm getting crash with this backtrace:
> > __list_add
> > __free_pages_ok
> > __free_pages
> > __free_pages_bootmem
> > __free_all_bootmem
> > mem_init
> > start_kernel_tlb_fixup_code
> > --- the crash is due to migratetype == 5 in __free_one_page (inlined into
> > __free_pages_ok) and because there are only 5 migratettypes, it attempts
> > to add to a non-existent list.
>
> Mikulas can you send me the .config you're using in 2.6.27 to trigger
> this?

Meanwhile I tried to figure out how this can go wrong like this.

The way this stuff works this early is very simple.

The pageblock bitmaps get allocated by sparse_init() as it iterates over
each mem section, via sparse_early_usemap_alloc(). These use the
various bootmem allocators, which will zero initialize the bitmap.

I added some debugging to sparse_early_usemap_alloc() to make sure
the size was correct and that the pointer looked sane.

What happens next is that memmap_init_zone() walks over each zone's
page and initializes their pageblock migrate type to MIGRATE_MOVABLE
which is "2".

So given the simplicity of that stuff, I can only imagine that something
is writing all over the bitmaps, clobbering them somehow.

I'll try to reproduce this here so I can try to narrow down the cause
a bit more, but so far my attempts have not been successful.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/