Re: [PATCH 00/13] Reconcile NUMA balancing decisions with the load balancer v6

From: Mel Gorman
Date: Thu May 07 2020 - 11:54:28 EST


On Thu, May 07, 2020 at 05:24:17PM +0200, Jirka Hladky wrote:
> Hi Mel,
>
> > > Yes, it's indeed OMP. With low threads count, I mean up to 2x number of
> > > NUMA nodes (8 threads on 4 NUMA node servers, 16 threads on 8 NUMA node
> > > servers).
> >
> > Ok, so we know it's within the imbalance threshold where a NUMA node can
> > be left idle.
>
> we have discussed today with my colleagues the performance drop for
> some workloads for low threads counts (roughly up to 2x number of NUMA
> nodes). We are worried that it can be a severe issue for some use
> cases, which require a full memory bandwidth even when only part of
> CPUs is used.
>
> We understand that scheduler cannot distinguish this type of workload
> from others automatically. However, there was an idea for a * new
> kernel tunable to control the imbalance threshold *. Based on the
> purpose of the server, users could set this tunable. See the tuned
> project, which allows creating performance profiles [1].
>

I'm not completely opposed to it but given that the setting is global,
I imagine it could have other consequences if two applications ran
at different times have different requirements. Given that it's OMP,
I would have imagined that an application that really cared about this
would specify what was needed using OMP_PLACES. Why would someone prefer
kernel tuning or a tuned profile over OMP_PLACES? After all, it requires
specific knowledge of the application even to know that a particular
tuned profile is needed.

--
Mel Gorman
SUSE Labs