Re: [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19

From: Mel Gorman
Date: Tue Nov 01 2005 - 12:19:40 EST


On Wed, 2 Nov 2005, Kamezawa Hiroyuki wrote:

> Ingo Molnar wrote:
> > so it's all about expectations: _could_ you reasonably remove a piece of
> > RAM? Customer will say: "I have stopped all nonessential services, and free
> > RAM is at 90%, still I cannot remove that piece of faulty RAM, fix the
> > kernel!". No reasonable customer will say: "True, I have all RAM used up in
> > mlock()ed sections, but i want to remove some RAM nevertheless".
> >
> Hi, I'm one of men in -lhms
>
> In my understanding...
> - Memory Hotremove on IBM's LPAR? approach is
> [remove some amount of memory from somewhere.]
> For this approach, Mel's patch will work well.
> But this will not guaranntee a user can remove specified range of
> memory at any time because how memory range is used is not defined by an
> admin
> but by the kernel automatically. But to extract some amount of memory,
> Mel's patch is very important and they need this.
>
> My own target is NUMA node hotplug, what NUMA node hotplug want is
> - [remove the range of memory] For this approach, admin should define
> *core* node and removable node. Memory on removable node is removable.
> Dividing area into removable and not-removable is needed, because
> we cannot allocate any kernel's object on removable area.
> Removable area should be 100% removable. Customer can know the limitation
> before using.
>

In this case, we would want some mechanism that says "don't put awkward
pages in this NUMA node" in a clear way. One way we could do this is;

1. Move fallback_allocs to be per-node. fallback_allocs is currently
defined as
int fallback_allocs[RCLM_TYPES-1][RCLM_TYPES+1] = {
{RCLM_NORCLM, RCLM_FALLBACK, RCLM_KERN, RCLM_EASY, RCLM_TYPES},
{RCLM_EASY, RCLM_FALLBACK, RCLM_NORCLM, RCLM_KERN, RCLM_TYPES},
{RCLM_KERN, RCLM_FALLBACK, RCLM_NORCLM, RCLM_EASY, RCLM_TYPES}
};

The effect is that a RCLM_NORCLM allocation, falls back to
RCLM_FALLBACK, RCLM_KERN, RCLM_EASY and then gives up.

2. Architectures would need to provide a function that allocates and
populates a fallback_allocs[][] array. If they do not provide one, a
generic function uses array like the one above

3. When adding a node that must be removable, make the array look like
this

int fallback_allocs[RCLM_TYPES-1][RCLM_TYPES+1] = {
{RCLM_NORCLM, RCLM_TYPES, RCLM_TYPES, RCLM_TYPES, RCLM_TYPES},
{RCLM_EASY, RCLM_FALLBACK, RCLM_NORCLM, RCLM_KERN, RCLM_TYPES},
{RCLM_KERN, RCLM_TYPES, RCLM_TYPES, RCLM_TYPES, RCLM_TYPES},
};

The effect of this is only allocations that are easily reclaimable will
end up in this node. This would be a straight-forward addition to build
upon this set of patches. The difference would only be visible to
architectures that cared.

> What I'm considering now is this:
> - removable area is hot-added area
> - not-removable area is memory which is visible to kernel at boot time.
> (I'd like to achieve this by the limitation : hot-added node goes into only
> ZONE_HIGHMEM)


ZONE_HIGHMEM can still end up with PTE pages if allocating PTE pages from
highmem is configured. This is bad. With the above approach, nodes that
are not hot-added that have a ZONE_HIGHMEM will be able to use it for PTEs
as well. But when a node is hot-added, it will have a ZONE_HIGHMEM that is
not used for PTE allocations because they are not RCLM_EASY allocations.

--
Mel Gorman
Part-time Phd Student Java Applications Developer
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/