[PATCH -V2] mm/migrate: fix CPUHP state to update node demotion order

From: Huang Ying
Date: Mon Sep 27 2021 - 04:11:45 EST


The node demotion order needs to be updated during CPU hotplug.
Because whether a NUMA node has CPU may influence the demotion order.
The update function should be called during CPU online/offline after
the node_states[N_CPU] has been updated. That is done in
CPUHP_AP_ONLINE_DYN during CPU online and in CPUHP_MM_VMSTAT_DEAD
during CPU offline. But in commit 884a6e5d1f93 ("mm/migrate: update
node demotion order on hotplug events"), the function to update node
demotion order is called in CPUHP_AP_ONLINE_DYN during CPU
online/offline. This doesn't satisfy the order requirement. So in
this patch, we added CPUHP_AP_MM_DEMOTION_ONLINE and
CPUHP_MM_DEMOTION_DEAD to be called after CPUHP_AP_ONLINE_DYN and
CPUHP_MM_VMSTAT_DEAD during CPU online and offline, and register the
update function on them.

Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events")
Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: Yang Shi <shy828301@xxxxxxxxx>
Cc: Zi Yan <ziy@xxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxx>
Cc: Wei Xu <weixugc@xxxxxxxxxx>
Cc: Oscar Salvador <osalvador@xxxxxxx>
Cc: David Rientjes <rientjes@xxxxxxxxxx>
Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
Cc: David Hildenbrand <david@xxxxxxxxxx>
Cc: Greg Thelen <gthelen@xxxxxxxxxx>
Cc: Keith Busch <kbusch@xxxxxxxxxx>

Changes:

v2:

- Revise state name to follow the naming convention per Thomas' comments.

- Use cpuhp_setup_state() to initialize the initial order per Mika's comments.
---
include/linux/cpuhotplug.h | 4 ++++
mm/migrate.c | 8 +++++---
2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 832d8a74fa59..991911048857 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -72,6 +72,8 @@ enum cpuhp_state {
CPUHP_SLUB_DEAD,
CPUHP_DEBUG_OBJ_DEAD,
CPUHP_MM_WRITEBACK_DEAD,
+ /* Must be after CPUHP_MM_VMSTAT_DEAD */
+ CPUHP_MM_DEMOTION_DEAD,
CPUHP_MM_VMSTAT_DEAD,
CPUHP_SOFTIRQ_DEAD,
CPUHP_NET_MVNETA_DEAD,
@@ -240,6 +242,8 @@ enum cpuhp_state {
CPUHP_AP_BASE_CACHEINFO_ONLINE,
CPUHP_AP_ONLINE_DYN,
CPUHP_AP_ONLINE_DYN_END = CPUHP_AP_ONLINE_DYN + 30,
+ /* Must be after CPUHP_AP_ONLINE_DYN for node_states[N_CPU] update */
+ CPUHP_AP_MM_DEMOTION_ONLINE,
CPUHP_AP_X86_HPET_ONLINE,
CPUHP_AP_X86_KVM_CLK_ONLINE,
CPUHP_AP_DTPM_CPU_ONLINE,
diff --git a/mm/migrate.c b/mm/migrate.c
index c14a55004fee..7769abac8aad 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -3284,9 +3284,8 @@ static int __init migrate_on_reclaim_init(void)
{
int ret;

- ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "migrate on reclaim",
- migration_online_cpu,
- migration_offline_cpu);
+ ret = cpuhp_setup_state_nocalls(CPUHP_MM_DEMOTION_DEAD, "mm/demotion:offline",
+ NULL, migration_offline_cpu);
/*
* In the unlikely case that this fails, the automatic
* migration targets may become suboptimal for nodes
@@ -3294,6 +3293,9 @@ static int __init migrate_on_reclaim_init(void)
* rare case, do not bother trying to do anything special.
*/
WARN_ON(ret < 0);
+ ret = cpuhp_setup_state(CPUHP_AP_MM_DEMOTION_ONLINE, "mm/demotion:online",
+ migration_online_cpu, NULL);
+ WARN_ON(ret < 0);

hotplug_memory_notifier(migrate_on_reclaim_callback, 100);
return 0;
--
2.30.2