Re: [PATCH 0/1] RFC: sched/fair: skip select_idle_sibling() in presence of sync wakeups

From: Andrea Arcangeli
Date: Wed Jan 09 2019 - 13:02:31 EST


Hello Mike,

On Wed, Jan 09, 2019 at 05:19:48AM +0100, Mike Galbraith wrote:
> On Tue, 2019-01-08 at 22:49 -0500, Andrea Arcangeli wrote:
> > Hello,
> >
> > we noticed some unexpected performance regressions in the scheduler by
> > switching the guest CPU topology from "-smp 2,sockets=2,cores=1" to
> > "-smp 2,sockets=1,cores=2".
*snip*
> > To test I used this trivial program.
>
> Which highlights the problem. That proggy really is synchronous, but

Note that I wrote the program only after the guest scheduler
regression was reported, purely in order to test the patch and to
reproduce the customer issue more easily (so I could see the effect by
just running top). The regression was reported by a real life customer
workload AFIK and it was caused by the idle balancing dropping the
sync information.

If it was just the lat_ctx type of workload like the program I
attached I wouldn't care either, but this was a localhost udp and tcp
(both bandwidth and latency) test that showed improvement by not
dropping to the sync information through idle core balancing during
wakeups.

There is no tuning to allow people to test the sync information with
real workloads, the only way is to rebuild the kernel with SCHED_MC=n
(which nobody should be doing because it has other drawbacks) or by
altering the vCPU topology. So for now we're working to restore the
standard only-sockets topology to shut off the idle balancing without
having to patch the guest scheduler, but this looked like a more
general problem that has room for improvement.

Ideally we should detect when the sync information is worth keeping
instead of always dropping it. Alternatively a sched_feat could be
added to achieve it manually.

Thanks,
Andrea