Re: [PATCH 0/1] RFC: sched/fair: skip select_idle_sibling() in presence of sync wakeups

From: Mike Galbraith
Date: Tue Jan 08 2019 - 23:28:05 EST


On Tue, 2019-01-08 at 22:49 -0500, Andrea Arcangeli wrote:
> Hello,
>
> we noticed some unexpected performance regressions in the scheduler by
> switching the guest CPU topology from "-smp 2,sockets=2,cores=1" to
> "-smp 2,sockets=1,cores=2".
>
> With sockets=2,cores=1 localhost message passing (pipes, AF_UNIX etc..)
> runs serially at 100% CPU load of a single vcpu with optimal
> performance. With sockets=1,cores=2 the load is spread across both
> vcpus and performance is reduced.
>
> With SCHED_MC=n on older kernels the problem goes away (but that's far
> from ideal for heavily multithreaded workloads which then regress)
> because that basically disables the last part of select_idle_sibling().
>
> On bare metal with SCHED_MC=y on any recent multicore CPU the
> scheduler (as expected) behaves like sockets=1,cores=2, so it won't
> run the tasks serially.
>
> The reason is that select_idle_sibling() can disregard the "sync" hint
> and all decisions done up to that point and at the last minute it can
> decide to move the waken task to an arbitrary idle core.
>
> To test the above theory I implemented this patch which seems to
> confirm the reason the tasks won't run serially anymore with
> sockets=1,cores=2 is select_idle_sibling() overriding the "sync" hint.
>
> You worked on the wake_affine() before so you may want to review this
> issue, if you agree these sync workloads should run serially even in
> presence of idle cores in the system. I don't know if the current
> behavior is on purpose but if it is, it'd be interesting to know
> why. This is just a RFC.
>
> To test I used this trivial program.

Which highlights the problem. That proggy really is synchronous, but
the sync hint is applied to many MANY real world cases where this is
not the case at all. Sure, you can make things like pipe_test and
nearly nonexistent payload TCP_RR numbers look gorgeous, but that
demolishes concurrency for real applications.

Even when any given wakeup really is a truly synchronous wakeup, does,
say, an application sending a byte down a pipe necessarily mean it's
not going to then execute in parallel with the recipient of that byte?
I think not. IMO, the sync hint should die. The only time it is a win
is for applications that do essentially nothing but context switch. I
don't think the real world plays a lot of high speed ping pong.

> /*
> * pipe-loop.c
> *
> * Copyright (C) 2019 Red Hat, Inc.
> *
> * This work is licensed under the terms of the GNU GPL, version 2.
> */
>
> #include <stdio.h>
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/wait.h>
>
> int main(int argc, char ** argv)
> {
> char buf[1];
> int n = 1000000;
>
> int pipe1[2], pipe2[2];
>
> pipe(pipe1);
> pipe(pipe2);
>
> if (fork()) {
> while (n--) {
> read(pipe1[0], buf, 1);
> write(pipe2[1], buf, 1);
> }
> wait(NULL);
> } else {
> while (n--) {
> write(pipe1[1], buf, 1);
> read(pipe2[0], buf, 1);
> }
> }
>
> return 0;
> }
>
> Andrea Arcangeli (1):
> sched/fair: skip select_idle_sibling() in presence of sync wakeups
>
> kernel/sched/fair.c | 13 ++++++++-----
> 1 file changed, 8 insertions(+), 5 deletions(-)