Re: [PATCH 09/16] sched/fair: Let asymmetric cpu configurations balance at wake-up

From: Morten Rasmussen
Date: Wed Jun 08 2016 - 07:29:00 EST


On Thu, Jun 02, 2016 at 04:21:05PM +0200, Peter Zijlstra wrote:
> On Mon, May 23, 2016 at 11:58:51AM +0100, Morten Rasmussen wrote:
> > Currently, SD_WAKE_AFFINE always takes priority over wakeup balancing if
> > SD_BALANCE_WAKE is set on the sched_domains. For asymmetric
> > configurations SD_WAKE_AFFINE is only desirable if the waking task's
> > compute demand (utilization) is suitable for the cpu capacities
> > available within the SD_WAKE_AFFINE sched_domain. If not, let wakeup
> > balancing take over (find_idlest_{group, cpu}()).
> >
> > The assumption is that SD_WAKE_AFFINE is never set for a sched_domain
> > containing cpus with different capacities. This is enforced by a
> > previous patch based on the SD_ASYM_CPUCAPACITY flag.
> >
> > Ideally, we shouldn't set 'want_affine' in the first place, but we don't
> > know if SD_BALANCE_WAKE is enabled on the sched_domain(s) until we start
> > traversing them.
>
> I'm a bit confused...
>
> Lets assume a 2+2 big.little thing with shared LLC:
>
>
> ---------- SD2 ----------
>
> -- SD1 -- -- SD1 --
>
> 0 1 2 3
>
>
> SD1: WAKE_AFFINE, BALANCE_WAKE
> SD2: ASYM_CAPACITY, BALANCE_WAKE
>
> t0 used to run on cpu1, t0 used to run on cpu2
>
> cpu0 wakes t0:
>
> want_affine = 1
> SD1:
> WAKE_AFFINE
> cpumask_test_cpu(prev_cpu, sd_mask) == true
> affine_sd = SD1
> break;
>
> affine_sd != NULL -> affine-wakeup
>
> cpu0 wakes t1:
>
> want_affine = 1
> SD1:
> WAKE_AFFINE
> cpumask_test_cpu(prev_cpu, sd_mask) == false
> SD2:
> BALANCE_WAKE
> sd = SD2
>
> affine_sd == NULL, sd == SD2 -> find_idlest_*()
>
>
> All without this patch...
>
> So what is this thing doing?

Not very much in those cases, but it makes one important difference in
one case. We could do fine without the patch if we could assume that all
tasks are already in the right SD according their PELT utilization and
if not they will be woken up by a cpu in the right SD (so we do
find_idlest_*()). But we can't :-(

Let's take your example above and add that t0 should really be running
on cpu2/3 due to its utilization, assuming SD1[01] are little cpus and
SD1[23] are big cpus. In that case we would still do affine-wakeup and
stick the task on cpu0 despite it being a little cpu.

To avoid that, this patch sets want_affine = 0 in that case so we go
find_idlest_*() to give the task a chance of being put on cpu2/3. The
patch is also setting want_affine = 0 for other cases which are already
taking the find_idlest_*() route due to the cpumask test as illustrated
by your example above.

We can have the current scenarios:

b = big cpu capacity/task util
l = little cpu capacity/task util
x = don't care

case task util prev_cpu this_cpu wakeup
-------------------------------------------------------------------
1 b b b affine (b)
2 b b l slow (b)
3 b l b slow (b)
4 b l l slow (b)
5 l b b affine (x)
6 l b l slow (x)
7 l l b slow (x)
8 l l l affine (x)

Without the patch we would do affine-wakeup on little in case 4, where
we want to wake up on a big cpu. We only do affine-wakeup when both
this_cpu and prev_cpu have the same capacity and that capacity is
sufficient.

Vincent pointed out that it is overly restrictive as it is perfectly
safe to do affine-wakeup in case 6 and 7, where the waker and the
previous cpu have sufficient capacity but they are not the same.

If we made wake_affine() consider cpu capacity, it should be possible to
do affine-wakeup even for case 2 and 3, leaving us with only case 4
requiring the find_idles_*() route.

There are more cases for taking the slow wakeup path if you have more
than two cpu capacities to deal with, but I'm going to spare you the
full detailed table ;-)