Re: [PATCH 08/11] sched/numa: Bias swapping tasks based on their preferred node

From: Mel Gorman
Date: Thu Feb 13 2020 - 06:18:09 EST


On Thu, Feb 13, 2020 at 11:31:08AM +0100, Peter Zijlstra wrote:
> On Wed, Feb 12, 2020 at 09:36:51AM +0000, Mel Gorman wrote:
> > When swapping tasks for NUMA balancing, it is preferred that tasks move
> > to or remain on their preferred node. When considering an imbalance,
> > encourage tasks to move to their preferred node and discourage tasks from
> > moving away from their preferred node.
>
> Wasn't there an issue for workloads that span multiple nodes?
>

Sortof, yes -- specifically workloads that could not fit inside a node
for whatever reason.

> Say a 4 node system with 2 warehouses? Then each JVM will want 2 nodes,
> instead of a single node, and strong preferred node stuff makes it
> difficult to achieve this.
>
> I forgot how we dealt with these cases, just something I worry about
> when reading this.

We deal with it in task_numa_migrate() by considering nodes other
than the preferred node for placement -- see "Look at other nodes in
these cases" followed by a sched_setnuma if the preferred node doesn't
match.

We do not do any special casing as such in task_numa_compare other than
finding the best improvement so we can pick a task belonging to a group
spanning multiple nodes with or without this patch. A workload spanning
multiple nodes in itself does not justify a full search if it can be
avoided.

--
Mel Gorman
SUSE Labs