Re: [PATCH] mm: page_alloc: don't allocate page from memoryless nodes

From: Qi Zheng
Date: Tue Feb 14 2023 - 07:34:27 EST




On 2023/2/14 19:44, Mike Rapoport wrote:
(added x86 folks)

On Tue, Feb 14, 2023 at 12:29:42PM +0100, David Hildenbrand wrote:
On 14.02.23 12:26, Qi Zheng wrote:
On 2023/2/14 19:22, David Hildenbrand wrote:

TBH, this is the first time I hear of NODE_MIN_SIZE and it seems to be a
pretty x86 specific thing.

Are we sure we want to get NODE_MIN_SIZE involved?

Maybe add an arch_xxx() to handle it?

I still haven't figured out what we want to achieve with NODE_MIN_SIZE at
all. It smells like an arch-specific hack looking at

"Don't confuse VM with a node that doesn't have the minimum amount of
memory"

Why shouldn't mm-core deal with that?

Well, a node with <4M RAM is not very useful and bears all the overhead of
an extra live node.

But, hey, why won't we just drop that '< NODE_MIN_SIZE' and let people with
weird HW configurations just live with this?

Just to sum up, whether we deal with '< NODE_MIN_SIZE' or not, IIUC, the
following two should be modified:

1) we should skip memoryless nodes completely in find_next_best_node():

@@ -6382,8 +6378,11 @@ int find_next_best_node(int node, nodemask_t
*used_node_mask)
int min_val = INT_MAX;
int best_node = NUMA_NO_NODE;

- /* Use the local node if we haven't already */
- if (!node_isset(node, *used_node_mask)) {
+ /*
+ * Use the local node if we haven't already. But for memoryless
local
+ * node, we should skip it and fallback to other nodes.
+ */
+ if (!node_isset(node, *used_node_mask) && node_state(node,
N_MEMORY)) {
node_set(node, *used_node_mask);
return node;
}

This also fixes the bug mentioned in commit message.

2) we should call node_states_clear_node() before build_all_zonelists()
in offline_pages():

@@ -1931,12 +1931,12 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages,
/* reinitialise watermarks and update pcp limits */
init_per_zone_wmark_min();

+ node_states_clear_node(node, &arg);
if (!populated_zone(zone)) {
zone_pcp_reset(zone);
build_all_zonelists(NULL);
}

- node_states_clear_node(node, &arg);
if (arg.status_change_nid >= 0) {
kcompactd_stop(node);
kswapd_stop(node);

Otherwise, the node whose N_MEMORY state is about to be cleared will still be established in the fallback list of other nodes.

Right?

Thanks,
Qi

I'd appreciate an explanation of the bigger picture, what the issue is and
what the approach to solve it is (including memory onlining/offlining).

--
Thanks,

David / dhildenb



--
Thanks,
Qi