Re: [PATCH] sched: Introduce scaled capacity awareness in enqueue

From: Rohit Jain
Date: Mon Jun 12 2017 - 20:28:53 EST


On 06/12/2017 08:36 AM, Peter Zijlstra wrote:

On Mon, Jun 05, 2017 at 11:08:38AM -0700, Rohit Jain wrote:

Why does determining if a CPU's capacity is scaled down need to involve
global data? AFAICT its a purely CPU local affair.
The global array is used to determine the threshold capacity, so
that any CPU which lies below decides that a CPU is 'running low' on
available capacity. This threshold can also be statically defined to
be a fixed fraction, but having dynamic calculation to determine the
threshold works for all benchmarks.

What scenario is important for what workload?

The real workload which would give you the conditions is OLTP. In the
workload there is computation and synchronization and at the same
time interrupt activity is present as well. Different CPUs at different
time(s) have a lot of soft IRQ activity.

I simulated this by using a 'ping' which would generate IRQ activity in
the system, while the system had a barrier synchronization program. If
the threads end up on a CPU which has a 'lot' of soft IRQ activity,
then this delayed threads ends up delaying other threads as well.

Since in the real workload the interrupt activity keep changing we still
want to avoid, the 'really heavy' IRQ activity CPUs. In cases where all
the idle CPUs have similar scaled capacity available, we would end
up iterating over all the CPUs in the LLC domain to find the suitable
one.

Thirdly, did you consider heterogeneous setups, where CPUs can have
different capacities?

My assumption here is that we still want to choose the CPUs with higher
capacities. Please correct me if I am wrong here.

And clearly you did not consider the 4K CPUs case, 4K cpus poking at a
global data structure will completely come apart.

Since the global structure is only referred during load balancing, the
cost could (possibly) be OK?

Did you mean we should use a static cutoff and decide whether a CPU
should be treated low on capacity and skip it during idle CPU search?
Yes. Why would that not suffice? You've not even begun to explain why
you need all the additional complexity.

In cases where the IRQ activity is evenly spread out across all the CPUs
a fixed cutoff could cause us to iterate over the LLC scheduling domain.

I will test the 'static cutoff' changes against real workloads and send
v2 accordingly.