Re: [RFC PATCH v2 2/2] mm/damon/paddr: Allow multiple migrate targets

From: Bijan Tabatabai
Date: Mon Jun 23 2025 - 19:15:36 EST


[...]
> Thank you for walking with me, Bijan. I understand and agree your concerns.
> Actually, this kind of unnecessary ping-pong is a general problem for DAMOS.
> We hence made a few DAMOS features to avoid this issue.
>
> The first feature is 'age' reset. DAMOS sets 'age' of regions to zero when it
> applies an action. Hence if your DAMOS scheme has minimum 'age' for the target
> access pattern, the region will not be selected as action target again, very
> soon.
>
> The second feature is the quota. You can set speed limit of a DAMOS action, to
> avoid DAMOS being too aggressive. When DAMOS finds memory regions that
> eligible for a given action and larger than the given quota, it calculates
> access temperature of regions, and apply the action to only hottest or coldest
> regions of quota amount. Whether to prioritize hotter or colder depends on the
> action. DAMOS_MIGRATE_HOT prefers hotter one. Together with the age reset,
> this can reduce unnecessary pingpong.
>
> The third feature is quota auto-tuning. You can ask DAMON to adjust the quotas
> on its own, based on some metrics. Let me describe an example with memory
> tiering use case. Consider there are two NUMA nodes of different speed. Node
> 0 is faster than node 1, samely for every CPU. Then you can ask DAMON to
> migrate hot pages on node 1 to node 0 aiming 99% of node 0 memory be allocated,
> while migrating cold pages on node 0 to node 1 aiming 1% of node 0 memory be
> free. Then, DAMON will adjust the quotas for two different schemes based on
> current node 0 memory used/free amount. If node 0 memory is used less than
> 99%, hot pages migration scheme will work. The aggressiveness will be
> determined on the difference between the current memory usage and the target
> usage. For example, DAMON will try to migrate hot pages faster when node 0
> memory usage is 50%, compared to when node 0 memory usage is 98%. The cold
> pages migration scheme will do nothing when node 0 memory is used less than
> 99%, since its goal (1% node 0 free memory ratio) is already over-achieved.
> When the node 0 memory usage becomes 99% and no more allocation is made, DAMON
> will be quiet. Even if a few more allocations happen, DAMON will work in slow
> speed, and hence make only reasonable and healthy amount of noise.
>
> Back to your use case, you could set per-node ideal memory usage of
> interleaving as the quota goal. For example, on the 1:1 ratio interleaving on
> 2 NUMA nodes, you could use two DAMOS scheme, one aiming 50% node 0 memused,
> and other one aiming 50% node 0 memfree. Once pages are well interleaved, both
> schemes will stop working for unnecessary pingponging.
>
> Note that you can one of quota auto-tuning metric that DAMON support is
> arbitrary user input. When this is being used, users can simply feed any value
> as current value of the goal metric. For example, you can use application's
> performance metric, memory bandwidth, or whatever. You could see the
> node0-node1 balance from your user-space tool and feed it to DAMON quota
> auto-tuning. Then, DAMON will do more migration when it is imbalanced, and no
> more migration when it is well balanced.
>
> Finally, you can change DAMON parameters including schemes while DAMON is
> running. You can add and remove schemes whenever you want, while DAMON keeps
> monitoring the access pattern. Your user-space tool can determine how
> aggressive migration is required based on current memory balance and adjust
> DAMOS quotas online, or even turns DAMOS schemes off/on on demand.
>
> So I think you could avoid the problem using these features. Does this make
> sense to you?
>
> In future, we could add more DAMOS self-feedback metric for this use case. For
> example, the memory usage balance of nodes. My self-tuning example above was
> using two schemes since there is no DAMOS quota goal tuning metric that can
> directly be used for your use case. But I'd say that shouldn't be a blocker of
> this work.


Hi SeongJae,

I really appreciate your detailed response.
The quota auto-tuning helps, but I feel like it's still not exactly
what I want. For example, I think a quota goal that stops migration
based on the memory usage balance gets quite a bit more complicated
when instead of interleaving all data, we are just interleaving *hot*
data. I haven't looked at it extensively, but I imagine it wouldn't be
easy to identify how much data is hot in the paddr setting, especially
because the regions can contain a significant amount of unallocated
data. Also, if the interleave weights changed, for example, from 11:9
to 10:10, it would be preferable if only 5% of data is migrated;
however, with the round robin approach, 50% would be. Finally, and I
forgot to mention this in my last message, the round-robin approach
does away with any notion of spatial locality, which does help the
effectiveness of interleaving [1]. I don't think anything done with
quotas can get around that. I wonder if there's an elegant way to
specify whether to use rmap or not, but my initial feeling is that
might just add complication to the code and interface for not enough
benefit.

Maybe, as you suggest later on, this is an indication that my use case
is a better fit for a vaddr scheme. I'll get into that more below.

> > Using the VMA offset to determine where a page
> > should be placed avoids this problem because it gives a folio a single
> > node it can be in for a given set of interleave weights. This means
> > that in steady state, no folios will be migrated.
>
> This makes sense for this use case. But I don't think this makes same sense
> for possible other use cases, like memory tiering on systems having multiple
> NUMA nodes of same tier.

I see where you're coming from. I think the crux of this difference is
that in my use case, the set of nodes we are monitoring is the same as
the set of nodes we are migrating to, while in the use case you
describe, the set of nodes being monitored is disjoint from the set of
migration target nodes. I think this in particular makes ping ponging
more of a problem for my use case, compared to promotion/demotion
schemes.

> If you really need this virtual address space based
> deterministic behavior, it would make more sense to use virtual address spaces
> monitoring (damon-vaddr).

Maybe it does make sense for me to implement vaddr versions of the
migrate actions for my use case. One thing that gives me pause about
this, is that, from what I understand, it would be harder to have
vaddr schemes apply to processes that start after damon begins. I
think to do that, one would have to detect when a process starts, and
then do a damon tune to upgrade the targets list? It would be nice if,
say, you could specify a cgroup as a vaddr target and track all
processes in that cgroup, but that would be a different patchset for
another day.

But, using vaddr has other benefits, like the sampling would take into
account the locality of the accesses. There are also ways to make
vaddr sampling more efficient by using higher levels of the page
tables, that I don't think apply to paddr schemes [2]. I believe the
authors of [2] said they submitted their patches to the kernel, but I
don't know if it has been upstreamed (sorry about derailing the
conversation slightly).

[1] https://elixir.bootlin.com/linux/v6.16-rc3/source/mm/mempolicy.c#L213
[2] https://www.usenix.org/conference/atc24/presentation/nair