Re: [PATCH 00/10] [v6] Migrate Pages in lieu of discard

From: Yang Shi
Date: Mon Mar 08 2021 - 19:35:20 EST


On Thu, Mar 4, 2021 at 4:00 PM Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> wrote:
>
>
> The full series is also available here:
>
> https://github.com/hansendc/linux/tree/automigrate-20210304
>
> which also inclues some vm.zone_reclaim_mode sysctl ABI fixup
> prerequisites.
>
> The meat of this patch is in:
>
> [PATCH 05/10] mm/migrate: demote pages during reclaim
>
> Which also has the most changes since the last post. This version is
> mostly to address review comments from Yang Shi and Oscar Salvador.
> Review comments are documented in the individual patch changelogs.
>
> This also contains a few prerequisite patches that fix up an issue
> with the vm.zone_reclaim_mode sysctl ABI.
>
> Changes since (automigrate-20210122):
> * move from GFP_HIGHUSER -> GFP_HIGHUSER_MOVABLE since pages *are*
> movable.
> * Separate out helpers that check for being able to relaim anonymous
> pages versus being able to meaningfully scan the anon LRU.
>
> --
>
> We're starting to see systems with more and more kinds of memory such
> as Intel's implementation of persistent memory.
>
> Let's say you have a system with some DRAM and some persistent memory.
> Today, once DRAM fills up, reclaim will start and some of the DRAM
> contents will be thrown out. Allocations will, at some point, start
> falling over to the slower persistent memory.
>
> That has two nasty properties. First, the newer allocations can end
> up in the slower persistent memory. Second, reclaimed data in DRAM
> are just discarded even if there are gobs of space in persistent
> memory that could be used.
>
> This set implements a solution to these problems. At the end of the
> reclaim process in shrink_page_list() just before the last page
> refcount is dropped, the page is migrated to persistent memory instead
> of being dropped.
>
> While I've talked about a DRAM/PMEM pairing, this approach would
> function in any environment where memory tiers exist.
>
> This is not perfect. It "strands" pages in slower memory and never
> brings them back to fast DRAM. Other things need to be built to
> promote hot pages back to DRAM.
>
> This is also all based on an upstream mechanism that allows
> persistent memory to be onlined and used as if it were volatile:
>
> http://lkml.kernel.org/r/20190124231441.37A4A305@xxxxxxxxxxxxxxxxxx
>
> == Open Issues ==
>
> * For cpusets and memory policies that restrict allocations
> to PMEM, is it OK to demote to PMEM? Do we need a cgroup-
> level API to opt-in or opt-out of these migrations?

I'm wondering if such usecases, which don't want to have memory
allocate on pmem, will allow memory swapped out or reclaimed? If swap
is allowed then I failed to see why migrating to pmem should be
disallowed. If swap is not allowed, they should call mlock, then the
memory won't be migrated to pmem as well.

> * Could be more aggressive about where anon LRU scanning occurs
> since it no longer necessarily involves I/O. get_scan_count()
> for instance says: "If we have no swap space, do not bother
> scanning anon pages"

Yes, I agree. Johannes's patchset
(https://lore.kernel.org/linux-mm/20200520232525.798933-1-hannes@xxxxxxxxxxx/#r)
has lifted the swappiness to 200 so anonymous lru could be scanned
more aggressively. We definitely could tweak this if needed.

>
> --
>
> Documentation/admin-guide/sysctl/vm.rst | 9
> include/linux/migrate.h | 20 +
> include/linux/swap.h | 3
> include/linux/vm_event_item.h | 2
> include/trace/events/migrate.h | 3
> include/uapi/linux/mempolicy.h | 1
> mm/compaction.c | 3
> mm/gup.c | 4
> mm/internal.h | 5
> mm/memory-failure.c | 4
> mm/memory_hotplug.c | 4
> mm/mempolicy.c | 8
> mm/migrate.c | 369 +++++++++++++++++++++++++++++---
> mm/page_alloc.c | 13 -
> mm/vmscan.c | 173 +++++++++++++--
> mm/vmstat.c | 2
> 16 files changed, 560 insertions(+), 63 deletions(-)
>
> --
>
> Changes since (automigrate-20200818):
> * Fall back to normal reclaim when demotion fails
> * Fix some compile issues, when page migration and NUMA are off
>
> Changes since (automigrate-20201007):
> * separate out checks for "can scan anon LRU" from "can actually
> swap anon pages right now". Previous series conflated them
> and may have been overly aggressive scanning LRU
> * add MR_DEMOTION to tracepoint header
> * remove unnecessary hugetlb page check
>
> Changes since (https://lwn.net/Articles/824830/):
> * Use higher-level migrate_pages() API approach from Yang Shi's
> earlier patches.
> * made sure to actually check node_reclaim_mode's new bit
> * disabled migration entirely before introducing RECLAIM_MIGRATE
> * Replace GFP_NOWAIT with explicit __GFP_KSWAPD_RECLAIM and
> comment why we want that.
> * Comment on effects of that keep multiple source nodes from
> sharing target nodes
>
> Cc: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx>
> Cc: David Rientjes <rientjes@xxxxxxxxxx>
> Cc: Huang Ying <ying.huang@xxxxxxxxx>
> Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
> Cc: David Hildenbrand <david@xxxxxxxxxx>
> Cc: osalvador <osalvador@xxxxxxx>
> Cc: Huang Ying <ying.huang@xxxxxxxxx>
>
>