Re: [tip:core/memblock] x86, memblock: Fix crashkernel allocation

From: H. Peter Anvin
Date: Wed Oct 06 2010 - 19:09:35 EST


On 10/06/2010 03:47 PM, Vivek Goyal wrote:
>
> I really don't mind fixing the things properly in long term, just that I am
> running out of ideas regarding how to fix it in proper way.
>
> To me the best thing would be that this whole allocation thing be dyanmic
> from user space where kexec will run, determine what it is loading,
> determine what are the memory contstraints on these segments (min, upper
> limit, alignment etc), and then ask kernel for reserving contiguous
> memory. This kind of dynamic reservation will remove lot of problems
> associated with crashkernel= reservations.
>
> But I am not aware of anyway of doing dynamic allocation and it certainly
> does not seem to be easy to be able to allocated 128M of memory contiguously.
>
> Because we don't have a way to reserve memory dynamically later, we end up
> doing a big chunk of reservation using kernel command line and later
> figure out what to load where. Now with this approach kexec has not even run
> so how it can tell you what are the memory constraints.
>
> So to me one of the ways of properly fixing is adding some kind of
> capability to reserve the memory dynamically (may be using sys_kexec())
> and get rid of this notion of reserving memory at boot time.

The problem, of course, will allocating very large chunks of memory at
runtime is that there are going to be some number of non-movable and
non-evictable pages that are going to break up the contiguous ranges.
However, the mm recently added support for moving most pages, which
should make that kind of allocation a lot more feasible. I haven't
experimented how well it works in practice, but I rather suspect that as
long as the crashkernel is installed sufficiently early in the boot
process it should have a very good probability of success. Another
option, although one which has its own hackiness issues, is to do a
conservative allocation at boot time in preparation of the kexec call,
which is then freed. This doesn't really address the issue of location,
though, which is part of the problem here.

> The other concern you raised is hiding constraints from kernel. At this
> point of time the only problem with crashkernel=X@0 syntax is that it
> does not tell you whether to look for memory bottom up or top down. How
> about if we specify it explicitly in the syntax so that kernel does not
> have to assume things?

See below.

> In fact the initial crashkernel syntax was. crashkernel=X@Y. This meant
> allocated X amount of memory at location Y. This left no ambiguity and
> kernel did not have to assume things. It had the problem though that
> we might not have physical RAM at location Y. So I think that's when
> somebody came up with the idea of crashkernel=X@0 so that we ideally
> want memory at location 0, but if you can't provide that, then provide
> anything available next scanning bottom up.
>
> So the only part missing from syntax is explicitly speicifying "next
> available location scanning bottom up". If we add that to syntax then
> kernel does not have to make assumptions. (except the alignment part).
>
> So how about modifying syntax to crashkernel=X@Y#BU.
>
> The "#BU" part can be optional and in that case kernel is free to allocate
> memory either top down or bottom up.
>
> Or any other string which can communicate the bottom up part in a more
> intutive manner.

The whole problem here is that "bottoms up" isn't the true constraint --
it's a proxy for "this chunk needs < address X, this chunk needs <
address Y, ..." which is the real issue. This is particularly messy
since low memory is a (sometimes very) precious resource that is used by
a lot of things (BIOS stubs, DMA-mask-limited hardware devices, and
perhaps especially 1:1 mappable pages on 32 bits, and so on), and one of
the major reasons we want to switch to a top-down allocation scheme is
to not waste a precious resource when we don't have to.

The one improvement one could to the crashkernel= syntax is perhaps
"crashkernel=X<Y" meaning "allocate entirely below Y", since that is (at
least in part) the real constraint. It could even be extended to
multiple segments: "crashkernel=X<Y,Z<W,..." if we really need to...
that way you have your preallocation.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/