Re: [PATCH v4 00/10] sched/fair: rework the CFS load balance

From: Ingo Molnar
Date: Mon Nov 18 2019 - 08:15:57 EST



* Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote:

> On Mon, Oct 21, 2019 at 09:50:38AM +0200, Ingo Molnar wrote:
> > > <SNIP>
> >
> > Thanks, that's an excellent series!
> >
>
> Agreed despite the level of whining and complaining I made during the
> review.

I saw no whining and complaining whatsoever, and thanks for the feedback!
:-)

>
> > I've queued it up in sched/core with a handful of readability edits to
> > comments and changelogs.
> >
> > There are some upstreaming caveats though, I expect this series to be a
> > performance regression magnet:
> >
> > - load_balance() and wake-up changes invariably are such: some workloads
> > only work/scale well by accident, and if we touch the logic it might
> > flip over into a less advantageous scheduling pattern.
> >
> > - In particular the changes from balancing and waking on runnable load
> > to full load that includes blocking *will* shift IO-intensive
> > workloads that you tests don't fully capture I believe. You also made
> > idle balancing more aggressive in essence - which might reduce cache
> > locality for some workloads.
> >
> > A full run on Mel Gorman's magic scalability test-suite would be super
> > useful ...
> >
>
> I queued this back on the 21st and it took this long for me to get back
> to it.
>
> What I tested did not include the fix for the last patch so I cannot say
> the data is that useful. I also failed to include something that exercised
> the IO paths in a way that idles rapidly as that can catch interesting
> details (usually cpufreq related but sometimes load-balancing related).
> There was no real thinking behind this decision, I just used an old
> collection of tests to get a general feel for the series.

I have just applied Vincent's fix to find_idlest_group(), so that will
probably modify some of the results. (Hopefully for the better.)

Will push it out later today-ish.

> Most of the results were performance-neutral and some notable gains
> (kernel compiles were 1-6% faster depending on the -j count). Hackbench
> saw a disproportionate gain in terms of performance but I tend to be
> wary of hackbench as improving it is rarely a universal win. There
> tends to be some jitter around the point where a NUMA nodes worth of
> CPUs gets overloaded. tbench (mmtests configuation network-tbench) on a
> NUMA machine showed gains for low thread counts and high thread counts
> but a loss near the boundary where a single node would get overloaded.
>
> Some NAS-related workloads saw a drop in performance on NUMA machines
> but the size class might be too small to be certain, I'd have to rerun
> with the D class to be sure. The biggest strange drop in performance
> was the elapsed time to run the git test suite (mmtests configuration
> workload-shellscripts modified to use a fresh XFS partition) took
> 17.61% longer to execute on a UMA Skylake machine. This *might* be due
> to the missing fix because it is mostly a single-task workload.

Thanks a lot for your testing!

> I'm not going to go through the results in detail because I think
> another full round of testing would be required to take the fix into
> account. I'd also prefer to wait to see if the review results in any
> material change to the series.

I'll try to make sure it all gets addressed.

Thanks,

Ingo