Re: [PATCH 02/10] mm/numa: automatically generate node migration order

From: Dave Hansen
Date: Thu Apr 15 2021 - 11:35:22 EST


On 4/14/21 9:07 PM, Wei Xu wrote:
> On Wed, Apr 14, 2021 at 1:08 AM Oscar Salvador <osalvador@xxxxxxx> wrote:
>> Fast class/memory are pictured as those nodes with CPUs, while Slow class/memory
>> are PMEM, right?
>> Then, what stands for medium class/memory?
>
> That is Dave's example. I think David's guess makes sense (HBM - fast, DRAM -
> medium, PMEM - slow). It may also be possible that we have DDR5 as fast,
> CXL-DDR4 as medium, and CXL-PMEM as slow. But the most likely use cases for
> now should be just two tiers: DRAM vs PMEM or other types of slower
> memory devices.

Yes, it would be nice to apply this to fancier tiering systems. But
DRAM/PMEM combos are out in the wild today and it's where I expect this
to be used first.

> This can help enable more flexible demotion policies to be
> configured, such as to allow a cgroup to allocate from all fast tier
> nodes, but only demote to a local slow tier node. Such a policy can
> reduce memory stranding at the fast tier (compared to if memory
> hardwall is used) and still allow demotion from all fast tier nodes
> without incurring the expensive random accesses to the demoted pages
> if they were demoted to remote slow tier nodes.

Could you explain this stranding effect in a bit more detail? I'm not
quite following.