Re: [PATCH 1/3] powerpc/numa: Set numa_node for all possible cpus

From: Srikar Dronamraju
Date: Fri Mar 13 2020 - 07:27:58 EST


* Vlastimil Babka <vbabka@xxxxxxx> [2020-03-12 17:41:58]:

> On 3/12/20 5:13 PM, Srikar Dronamraju wrote:
> > * Vlastimil Babka <vbabka@xxxxxxx> [2020-03-12 14:51:38]:
> >
> >> > * Vlastimil Babka <vbabka@xxxxxxx> [2020-03-12 10:30:50]:
> >> >
> >> >> On 3/12/20 9:23 AM, Sachin Sant wrote:
> >> >> >> On 12-Mar-2020, at 10:57 AM, Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx> wrote:
> >> >> >> * Michal Hocko <mhocko@xxxxxxxxxx> [2020-03-11 12:57:35]:
> >> >> >>> On Wed 11-03-20 16:32:35, Srikar Dronamraju wrote:
> >> I think we do need well defined and documented rules around node_to_mem_node(),
> >> cpu_to_node(), existence of NODE_DATA, various node_states bitmaps etc so
> >> everyone handles it the same, safe way.
>
> So let's try to brainstorm how this would look like? What I mean are some rules
> like below, even if some details in my current understanding are most likely
> incorrect:
>

Agree.

> with nid present in:
> N_POSSIBLE - pgdat might not exist, node_to_mem_node() must return some online
> node with memory so that we don't require everyone to search for it in slightly
> different ways
> N_ONLINE - pgdat must exist, there doesn't have to be present memory,
> node_to_mem_node() still has to return something else (?)

Right, think this has been taken care of at this time.

> N_NORMAL_MEMORY - there is present memory, node_to_mem_node() returns itself
> N_HIGH_MEMORY - node has present high memory
>

dont see any problems with the above two to. That leaves us with N_POSSIBLE.

> >
> > Other option would be to tweak Kirill Tkhai's patch such that we call
> > kvmalloc_node()/kzalloc_node() if node is online and call kvmalloc/kvzalloc
> > if the node is offline.
>
> I really would like a solution that hides these ugly details from callers so
> they don't have to workaround the APIs we provide. kvmalloc_node() really
> shouldn't crash, and it should fallback automatically if we don't give it
> __GFP_THISNODE
>

Agree thats its better to make API's robust where possible.

> However, taking a step back, memcg_alloc_shrinker_maps() is probably rather
> wasteful on systems with 256 possible nodes and only few present, by allocating
> effectively dead structures for each memcg.
>

If we dont allocate now, we would have to allocate them when we online the
nodes. To me it looks better to allocate as soon as the nodes are onlined,

> SLUB tries to be smart, so it allocates the per-node per-cache structures only
> when the node goes online in slab_mem_going_online_callback(). This is why
> there's a crash when such non-existing structures are accessed for a node that's
> not online, and why they shouldn't be accessed.
>
> Perhaps memcg should do the same on-demand allocation, if possible.
>

Right.

--
Thanks and Regards
Srikar Dronamraju