Re: [PATCH 5 of 6] hotplug-memory: add section_ops

From: Jeremy Fitzhardinge
Date: Fri Apr 04 2008 - 01:33:44 EST


Dave Hansen wrote:
On Thu, 2008-04-03 at 18:12 -0700, Jeremy Fitzhardinge wrote:
Dave Hansen wrote:
On Thu, 2008-04-03 at 17:05 -0700, Jeremy Fitzhardinge wrote:
I think it might just be nicer to have a global list of these handlers
somewhere. The Xen driver can just say "put me on the list of
callbacks" and we'll call them at online_page(). I really don't think
we need to be passing an ops structure around.
Yes, but it seems a bit awkward. If we assume that:

1. Xen will be the only user of the hook, and
2. Xen-balloon hotplug is exclusive of real memory hotplug

then I guess its reasonable (though if that's the case it would be simpler to just put a direct call under #ifdef CONFIG_XEN_BALLOON in there).

Yeah, I'm OK with something along those lines, too. I'd prefer sticking
some stubs in a header and putting the #ifdef there, if only for
aesthetic reasons.

Sure. But I think its a very non-scalable approach; as soon as there's a second user who wants to do something like this, its worth going to a more ops or function-pointer path.

But if we think there can be multiple callbacks, and they all get called on the online of each page, and there can be multiple kinds of hotplug memory it gets pretty messy. Each has to determine "why was I called on this page?" and you'd to work out which one actually does the job of onlining. It just seems cleaner to say "this section needs to be onlined like this", and there's no ambiguity.

I really wish we'd stop calling it "page online". :)

Let me think out loud for a sec here. Here's how memory hotplug works
in a nutshell:

First step (add_memory() or probe time):
1. get more memory made available
2. create kva mapping for that memory (for lowmem)
3. allocate 'struct pages'

Second step, 'echo 1 > .../memoryXXX/online' time:
4. modify zone/pgdat spans (make the VM account for the memory)
5. Initialize the 'struct page'
6. free the memory into the buddy allocator

You can't do (2) because Xen doesn't allow mappings to be created until
real backing is there. You've already done this, right?

Well, it hasn't been an issue so far, because I've only been working with x86-32 where all hotplug memory is highmem. But, yes, on x86-64 and other architectures it would have to defer creating the mappings until it gets the pages (page by page).

You don't want to do (6) either, because there is no mapping for the
page and it isn't committed in hardware, yet, so you don't want someone
grabbing it *out* of the buddy allocator and using it.

Right. I do that page by page in the balloon driver; each time I get a machine page, I bind it to the corresponding page structure and free it into the allocator.

And I skip the whole "echo online > /sys..." part, because its redundant: the use of hotplug memory is an internal implementation detail of the balloon driver, which users needn't know about when they deal with the balloon driver.

Your solution is to take those first 1-5 steps, and had the balloon
driver call them directly. The online_page() modifications are because
5/6 are a bit intertwined right now. Are we on the same page so far?

Right.

Why don't we just free the page back into the balloon driver? Take the
existing steps 1-5, use them as they stand today, and just chop of step
6 for Xen. It'd save a bunch of this code churn and also stop us from
proliferating any kind of per-config section-online behavior like you're
asking about above.

That's more or less what I've done. I've grouped it as 1-4 when the balloon driver decides it needs more page structures, then 5&6 page by page when it actually gets some backing memory.

That might also be generalizable to anyone else that wants the "fresh
meat" newly-hotplugged memory. Large page users might be other
consumers here.

Sure. The main problem is that 1-3 also ends up implicitly registering the new section with sysfs, so the bulk online interface becomes to usermode. If we make that optional (ie, a separate explicit call the memory-adder can choose to take) then everything is rosy.

I'm already anticipating using the ops mechanism to support another class of Xen hotplug memory for managing large pages.

Do tell. :)

I think I mentioned it before. If we 1) modify Xen to manage domain memory in large pages, 2) have a reasonably small section size, then we can reasonably do all memory management directly via the hotplug interface. Bringing each (large) page online would still require some explicit action, but it would be a much closer fit to how the hotplug machinery currently works. Then a small user or kernel mode policy daemon could use it to replicate the existing balloon driver's functionality.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/