Re: assert/crash in __rmqueue() when enabling CONFIG_NUMA

From: Bob Picco
Date: Wed May 03 2006 - 21:32:20 EST


Nick Piggin wrote: [Tue May 02 2006, 10:25:57AM EDT]
> Martin J. Bligh wrote:
> >>Oh that's a 32bit kernel. I don't think the 32bit NUMA has ever worked
> >>anywhere but some Summit systems (at least every time I tried it it
> >>blew up on me and nobody seems to use it regularly). Maybe it would be
> >>finally time to mark it CONFIG_BROKEN though or just remove it (even
> >>by design it doesn't work very well)
> >
> >
> >Bollocks. It works fine, and is tested every single day, on every git
> >release, and every -mm tree.
>
> Whatever the case, there definitely does not appear to be sufficient
> zone alignment enforced for the buddy allocator. I cannot see how it
> could work if zones are not aligned on 4MB boundaries.
>
> Maybe some architectures / subarch code naturally does this for us,
> but Ingo is definitely hitting this bug because his config does not
> (align, that is).
>
> I've randomly added a couple more cc's.
>
The patch below isn't compile tested or correct for those cases where
alloc_remap is called or where arch code has allocated node_mem_map for
CONFIG_FLAT_NODE_MEM_MAP. It's just conveying what I believe the issue is.

Andy added code to buddy allocator which doesn't require the zone's endpoints
to be aligned to MAX_ORDER. I think the issue is that the buddy
allocator requires the node_mem_map's endpoints to be MAX_ORDER aligned.
Otherwise __page_find_buddy could compute a buddy not in node_mem_map
for partial MAX_ORDER regions at zone's endpoints. page_is_buddy will
detect that these pages at endpoints aren't PG_buddy (they were zeroed
out by bootmem allocator and not part of zone). Of course the negative
here is we could waste a little memory but the positive is eliminating
all the old checks for zone boundary conditions.

SPARSEMEM won't encounter this issue because of MAX_ORDER size
constraint when SPARSEMEM is configured. ia64 VIRTUAL_MEM_MAP doesn't
need the logic either because the holes and endpoints are handled
differently. This leaves checking alloc_remap and other arches which
privately allocate for node_mem_map.

Any how I could be totally wrong but like I said this requires more
thought.

bob


Index: linux-2.6.17-rc3/mm/page_alloc.c
===================================================================
--- linux-2.6.17-rc3.orig/mm/page_alloc.c 2006-04-27 09:44:02.000000000 -0400
+++ linux-2.6.17-rc3/mm/page_alloc.c 2006-05-03 14:50:13.000000000 -0400
@@ -2123,14 +2123,23 @@ static void __init alloc_node_mem_map(st
#ifdef CONFIG_FLAT_NODE_MEM_MAP
/* ia64 gets its own node_mem_map, before this, without bootmem */
if (!pgdat->node_mem_map) {
- unsigned long size;
+ unsigned long size, start, end;
struct page *map;

- size = (pgdat->node_spanned_pages + 1) * sizeof(struct page);
+ /*
+ * The zone's endpoints aren't required to be MAX_ORDER
+ * aligned but the node_mem_map endpoints must be in order
+ * for the buddy allocator to function correctly.
+ */
+ start = pgdat->node_start_pfn & ~((1 << (MAX_ORDER - 1)) - 1);
+ end = start + pgdat->node_spanned_pages;
+ end = (end + ((1 << (MAX_ORDER - 1)) - 1) &
+ ~((1 << (MAX_ORDER - 1)) - 1);
+ size = (end - start) * sizeof(struct page);
map = alloc_remap(pgdat->node_id, size);
if (!map)
map = alloc_bootmem_node(pgdat, size);
- pgdat->node_mem_map = map;
+ pgdat->node_mem_map = map + ( pgdat->node_start_pfn - start);
}
#ifdef CONFIG_FLATMEM
/*
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/