Re: [PATCH v4 1/2] memory-hotplug: add automatic onlining policy for the newly added memory

From: Vitaly Kuznetsov
Date: Thu Jan 14 2016 - 03:49:40 EST


David Rientjes <rientjes@xxxxxxxxxx> writes:

> On Wed, 13 Jan 2016, Vitaly Kuznetsov wrote:
>
>> >> diff --git a/Documentation/memory-hotplug.txt b/Documentation/memory-hotplug.txt
>> >> index ce2cfcf..ceaf40c 100644
>> >> --- a/Documentation/memory-hotplug.txt
>> >> +++ b/Documentation/memory-hotplug.txt
>> >> @@ -254,12 +254,23 @@ If the memory block is online, you'll read "online".
>> >> If the memory block is offline, you'll read "offline".
>> >>
>> >>
>> >> -5.2. How to online memory
>> >> +5.2. Memory onlining
>> >
>> > Idk why you're changing this title since you didn't change it in the table
>> > of contents and it already pairs with "6.2. How to offline memory".
>> >
>> > This makes it seem like you're covering all memory onlining operations in
>> > the kernel (including xen onlining) rather than just memory onlined by
>> > root. It doesn't cover the fact that xen onlining can be done without
>> > automatic onlining, so I would leave this section's title as it is and
>> > only cover aspects of memory onlining that users are triggering
>> > themselves.
>>
>> Ok, I changed the title to reflect the fact that a special action to
>> online memory is not always required any more but as the global policy
>> stays 'offline' by default for now let's keep the original title.
>>
>
> Thanks.
>
>> >> + /* online pages if requested */
>> >> + if (online)
>> >> + online_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT,
>> >> + MMOP_ONLINE_KEEP);
>> >> +
>> >> goto out;
>> >>
>> >> error:
>> >
>> > Well, shucks, what happens if online_pages() fails, such as if a memory
>> > hot-add notifier returns an errno for MEMORY_GOING_ONLINE? The memory was
>> > added but not subsequently onlined, although auto onlining was set, so how
>> > does userspace know the state it is in?
>>
>> Bad ... we could have checked the return value but I don't see a proper
>> way to handling it here: if we managed to online some blocks we can't
>> revert back. We'll probably have to online pages block-by-block (e.g. by
>> utilizing memory_block_change_state()) handling possible failures.
>>
>
> My suggestion is to just simply document that auto-onlining can add the
> memory but fail to online it and the failure is silent to userspace. If
> userspace cares, it can check the online status of the added memory blocks
> itself.

The problem is not only that it's silent, but also that
/sys/devices/system/memory/*/state will lie as we create all memory
blocks in MEM_ONLINE state and from online_pages() error we can't figure
out which particular block failed. 'v5' which I sent yesterday is
supposed to fix the issue (blocks are onlined with
memory_block_change_state() which handles failures.

--
Vitaly