Re: [PATCH v3] mm: fix panic in __alloc_pages

From: Alexey Makhalov
Date: Wed Dec 08 2021 - 03:19:21 EST


Hi Michal,

> On Dec 8, 2021, at 12:04 AM, Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Tue 07-12-21 17:17:27, Alexey Makhalov wrote:
>>
>>
>>> On Dec 7, 2021, at 9:13 AM, David Hildenbrand <david@xxxxxxxxxx> wrote:
>>>
>>> On 07.12.21 18:02, Alexey Makhalov wrote:
>>>>
>>>>
>>>>> On Dec 7, 2021, at 8:36 AM, Michal Hocko <mhocko@xxxxxxxx> wrote:
>>>>>
>>>>> On Tue 07-12-21 17:27:29, Michal Hocko wrote:
>>>>> [...]
>>>>>> So your proposal is to drop set_node_online from the patch and add it as
>>>>>> a separate one which handles
>>>>>> - sysfs part (i.e. do not register a node which doesn't span a
>>>>>> physical address space)
>>>>>> - hotplug side of (drop the pgd allocation, register node lazily
>>>>>> when a first memblocks are registered)
>>>>>
>>>>> In other words, the first stage
>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>> index c5952749ad40..f9024ba09c53 100644
>>>>> --- a/mm/page_alloc.c
>>>>> +++ b/mm/page_alloc.c
>>>>> @@ -6382,7 +6382,11 @@ static void __build_all_zonelists(void *data)
>>>>> if (self && !node_online(self->node_id)) {
>>>>> build_zonelists(self);
>>>>> } else {
>>>>> - for_each_online_node(nid) {
>>>>> + /*
>>>>> + * All possible nodes have pgdat preallocated
>>>>> + * free_area_init
>>>>> + */
>>>>> + for_each_node(nid) {
>>>>> pg_data_t *pgdat = NODE_DATA(nid);
>>>>>
>>>>> build_zonelists(pgdat);
>>>>
>>>> Will it blow up memory usage for the nodes which might never be onlined?
>>>> I prefer the idea of init on demand.
>>>>
>>>> Even now there is an existing problem.
>>>> In my experiments, I observed _huge_ memory consumption increase by increasing number
>>>> of possible numa nodes. I’m going to report it in separate mail thread.
>>>
>>> I already raised that PPC might be problematic in that regard. Which
>>> architecture / setup do you have in mind that can have a lot of possible
>>> nodes?
>>>
>> It is x86_64 VMware VM, not the regular one, but specially configured (1 vCPU per node,
>> with hot-plug support, 128 possible nodes)
>
> This is slightly tangent but could you elaborate more on this setup and
> reasoning behind it. I was already curious when you mentioned this
> previously. Why would you want to have so many nodes and having 1:1 with
> CPUs. What is the resulting NUMA topology?

This setup with 128 nodes was used purely for development purposes. That is when the issue
with hot adding numa nodes was found. Original issue presents even with feasible number of
nodes.

Thanks,
—Alexey

Attachment: signature.asc
Description: Message signed with OpenPGP