Re: [RFC] Add per-socket weight support for multi-socket systems in weighted interleave
From: Gregory Price
Date: Thu May 08 2025 - 11:19:37 EST
On Thu, May 08, 2025 at 03:30:36PM +0900, Rakie Kim wrote:
> On Wed, 7 May 2025 12:38:18 -0400 Gregory Price <gourry@xxxxxxxxxx> wrote:
>
> The proposed design is completely optional and isolated: it retains the
> existing flat weight model as-is and activates the source-aware behavior only
> when 'multi' mode is enabled. The complexity is scoped entirely to users who
> opt into this mode.
>
I get what you're going for, just expressing my experience around this
issue specifically.
The lack of enthusiasm for solving the cross-socket case, and thus
reduction from a 2D array to a 1D array, was because reasoning about
interleave w/ cross-socket interconnects is not really feasible with
the NUMA abstraction. Cross-socket interconnects are "Invisible" but
have real performance implications. Unless we have a way to:
1) Represent the topology, AND
2) A way to get performance about that topology
It's not useful. So NUMA is an incomplete (if not wrong) tool for this.
Additionally - reacting to task migration is not a real issue. If
you're deploying an allocation strategy, you probably don't want your
task migrating away from the place where you just spent a bunch of time
allocating based on some existing strategy. So the solution is: don't
migrate, and if you do - don't use cross-socket interleave.
Maybe if we solve the first half of this we can take a look at the task
migration piece again, but I wouldn't try to solve for migration.
At the same time we were discussing this, we were also discussing how to
do external task-mempolicy modifications - which seemed significantly
more useful, but ultimately more complex and without sufficient
interested parties / users.
~Gregory