Re: Make sure we populate the initroot filesystem late enough

From: Milton Miller
Date: Mon Feb 26 2007 - 11:45:33 EST


On Feb 27, 2007, at 2:24 AM, David Woodhouse wrote:
On Sun, 2007-02-25 at 20:13 -0800, Linus Torvalds wrote:
On Sun, 25 Feb 2007, David Woodhouse wrote:
Can you try adding something like

memset(start, 0xf0, end - start);

Yeah, I did that before giving up on it for the day and going in search
of dinner. It changes the failure mode to a BUG() in
cache_free_debugcheck(), at line 2876 of mm/slab.c

Ok, that's just strange.

In this case I hadn't left the 'return' in free_initrd_mem(). I was
poisoning the pages and then returning them to the pool as usual.

If I poison the pages and _don't_ return them to the pool, it boots
fine. PageReserved is set on every page in the initrd region; total
page_count() is equal to the number of pages (which doesn't
_necessarily_ mean that page_count() for every page is equal to 1 but
it's a strong hint that that's the case).

Looking in /dev/mem after it boots, I see that my poison is still
present throughout the whole region.

One obvious thing to do would be to remove all the "__initdata" entries in
mm/slab.c..

This is biting us long before we call free_initmem().

But I'd also like to see the full backtrace for the BUG_ON(),
in case that gives any clues at all.

I'll see if I can find a camera.

It smells like the pages weren't actually reserved in the first place
and we were blithely allocating them. The only problem with that theory
is that the initrd doesn't seem to be getting corrupted -- and if we
were handing out its pages like that then surely _something_ would have
scribbled on it before we tried to read it.

Yeah, I don't think it's necessarily initrd itself, I'd be more inclined
to think that the reason you see this change with the initrd unpacking is
simply that it does a lot of allocations for the initrd files, so I think
it is only indirectly involved - just because it ends up being a slab
user.

Whatever happens, initrd as a 'slab user' is fine. The crashes happen
_later_, when someone else is using the memory which used to belong to
the initrd. In that 'BUG at slab.c:2876' I mentioned above, r3 was
within the initrd region. As I said, I'll try to find a camera.


Just a thought,

Any chance you are using one of the unusal code paths, like the bootloader
moving the initrd or using a kernel-crash region?


milton

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/