Re: group scheduler regression since 4.3 (bisect 9d89c257d sched/fair: Rewrite runnable load and utilization average tracking)

From: Peter Zijlstra
Date: Mon Sep 26 2016 - 08:10:56 EST


On Mon, Sep 26, 2016 at 02:01:43PM +0200, Christian Borntraeger wrote:
> They applied ok on next from 9/13. Things go even worse.
> With this host configuration:
>
> CPU NODE BOOK SOCKET CORE L1d:L1i:L2d:L2i ONLINE CONFIGURED ADDRESS
> 0 0 0 0 0 0:0:0:0 yes yes 0
> 1 0 0 0 0 1:1:1:1 yes yes 1
> 2 0 0 0 1 2:2:2:2 yes yes 2
> 3 0 0 0 1 3:3:3:3 yes yes 3
> 4 0 0 1 2 4:4:4:4 yes yes 4
> 5 0 0 1 2 5:5:5:5 yes yes 5
> 6 0 0 1 3 6:6:6:6 yes yes 6
> 7 0 0 1 3 7:7:7:7 yes yes 7
> 8 0 0 1 4 8:8:8:8 yes yes 8
> 9 0 0 1 4 9:9:9:9 yes yes 9
> 10 0 0 1 5 10:10:10:10 yes yes 10
> 11 0 0 1 5 11:11:11:11 yes yes 11
> 12 0 0 1 6 12:12:12:12 yes yes 12
> 13 0 0 1 6 13:13:13:13 yes yes 13
> 14 0 0 1 7 14:14:14:14 yes yes 14
> 15 0 0 1 7 15:15:15:15 yes yes 15
>
> the guest was running either on 0-3 or on 4-15, but never
> used the full system. With group scheduling disabled everything was good
> again. So looks like that this bug has also some dependency on on the
> host topology.

OK, so CPU affinities that unevenly straddle topology boundaries like
that are hard (and is generally not recommended), but its not
immediately obvious why it would be so much worse with cgroups enabled.