RE: [RFC PATCH v5 4/4] scheduler: Add cluster scheduler level for x86

From: Song Bao Hua (Barry Song)
Date: Tue Apr 20 2021 - 18:31:52 EST




> -----Original Message-----
> From: Tim Chen [mailto:tim.c.chen@xxxxxxxxxxxxxxx]
> Sent: Wednesday, April 21, 2021 6:32 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>;
> catalin.marinas@xxxxxxx; will@xxxxxxxxxx; rjw@xxxxxxxxxxxxx;
> vincent.guittot@xxxxxxxxxx; bp@xxxxxxxxx; tglx@xxxxxxxxxxxxx;
> mingo@xxxxxxxxxx; lenb@xxxxxxxxxx; peterz@xxxxxxxxxxxxx;
> dietmar.eggemann@xxxxxxx; rostedt@xxxxxxxxxxx; bsegall@xxxxxxxxxx;
> mgorman@xxxxxxx
> Cc: msys.mizuma@xxxxxxxxx; valentin.schneider@xxxxxxx;
> gregkh@xxxxxxxxxxxxxxxxxxx; Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>;
> juri.lelli@xxxxxxxxxx; mark.rutland@xxxxxxx; sudeep.holla@xxxxxxx;
> aubrey.li@xxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx; linux-acpi@xxxxxxxxxxxxxxx; x86@xxxxxxxxxx;
> xuwei (O) <xuwei5@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>;
> guodong.xu@xxxxxxxxxx; yangyicong <yangyicong@xxxxxxxxxx>; Liguozhu (Kenneth)
> <liguozhu@xxxxxxxxxxxxx>; linuxarm@xxxxxxxxxxxxx; hpa@xxxxxxxxx
> Subject: Re: [RFC PATCH v5 4/4] scheduler: Add cluster scheduler level for x86
>
>
>
> On 3/23/21 4:21 PM, Song Bao Hua (Barry Song) wrote:
>
> >>
> >> On 3/18/21 9:16 PM, Barry Song wrote:
> >>> From: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> >>>
> >>> There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce
> >>> is shared among a cluster of cores instead of being exclusive
> >>> to one single core.
> >>>
> >>> To prevent oversubscription of L2 cache, load should be
> >>> balanced between such L2 clusters, especially for tasks with
> >>> no shared data.
> >>>
> >>> Also with cluster scheduling policy where tasks are woken up
> >>> in the same L2 cluster, we will benefit from keeping tasks
> >>> related to each other and likely sharing data in the same L2
> >>> cluster.
> >>>
> >>> Add CPU masks of CPUs sharing the L2 cache so we can build such
> >>> L2 cluster scheduler domain.
> >>>
> >>> Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
> >>> Signed-off-by: Barry Song <song.bao.hua@xxxxxxxxxxxxx>
> >>
> >>
> >> Barry,
> >>
> >> Can you also add this chunk to the patch.
> >> Thanks.
> >
> > Sure, Tim, Thanks. I'll put that into patch 4/4 in v6.
> >
>
> Barry,
>
> This chunk will also need to be added to return cluster id for x86.
> Please add it in your next rev.

Yes. Thanks. I'll put this in either RFC v7 or Patch v1.

For spreading path, things are much easier, though packing path is
quite tricky. But It seems RFC v6 has been quite close to what we want
to achieve to pack related tasks by scanning cluster for tasks within
same NUMA:
https://lore.kernel.org/lkml/20210420001844.9116-1-song.bao.hua@xxxxxxxxxxxxx/

If couples have been already in same LLC(numa), scanning clusters will
gather them further. If they are running in different NUMA nodes, the
original scanning LLC will move them to the same node, after that,
scanning cluster might put them closer to each other.

it seems it is kind of the two-level packing Dietmar has suggested.

So perhaps we won't have RFC v7, I will probably send patch v1 afterwards.

>
> Thanks.
>
> Tim
>
> ---
>
> diff --git a/arch/x86/include/asm/topology.h
> b/arch/x86/include/asm/topology.h
> index 800fa48c9fcd..2548d824f103 100644
> --- a/arch/x86/include/asm/topology.h
> +++ b/arch/x86/include/asm/topology.h
> @@ -109,6 +109,7 @@ extern const struct cpumask *cpu_clustergroup_mask(int cpu);
> #define topology_physical_package_id(cpu) (cpu_data(cpu).phys_proc_id)
> #define topology_logical_die_id(cpu) (cpu_data(cpu).logical_die_id)
> #define topology_die_id(cpu) (cpu_data(cpu).cpu_die_id)
> +#define topology_cluster_id(cpu) (per_cpu(cpu_l2c_id, cpu))
> #define topology_core_id(cpu) (cpu_data(cpu).cpu_core_id)
>
> extern unsigned int __max_die_per_package;

Thanks
Barry