Re: [RFC 2/3] sched/topology: fix sched groups on NUMA machines with mesh topology

From: Peter Zijlstra
Date: Fri Apr 14 2017 - 12:59:44 EST


On Fri, Apr 14, 2017 at 01:38:13PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 13, 2017 at 10:56:08AM -0300, Lauro Ramos Venancio wrote:
> > This patch constructs the sched groups from each CPU perspective. So, on
> > a 4 nodes machine with ring topology, while nodes 0 and 2 keep the same
> > groups as before [(3, 0, 1)(1, 2, 3)], nodes 1 and 3 have new groups
> > [(0, 1, 2)(2, 3, 0)]. This allows moving tasks between any node 2-hops
> > apart.
>
> Ah,.. so after drawing pictures I see what went wrong; duh :-(
>
> An equivalent patch would be (if for_each_cpu_wrap() were exposed):
>
> @@ -521,11 +588,11 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu)
> struct cpumask *covered = sched_domains_tmpmask;
> struct sd_data *sdd = sd->private;
> struct sched_domain *sibling;
> - int i;
> + int i, wrap;
>
> cpumask_clear(covered);
>
> - for_each_cpu(i, span) {
> + for_each_cpu_wrap(i, span, cpu, wrap) {
> struct cpumask *sg_span;
>
> if (cpumask_test_cpu(i, covered))
>
>
> We need to start iterating at @cpu, not start at 0 every time.
>
>

OK, please have a look here:

https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=sched/core