CFS vs. cpufreq/cstates vs. latency

From: Rik van Riel
Date: Tue Jul 17 2012 - 10:23:39 EST

Next message: Christoph Lameter: "Re: [PATCH 2/4 v2] mm: fix possible incorrect return value ofmigrate_pages() syscall"
Previous message: Mark Brown: "Re: linux-next: manual merge of the arm-soc tree with thei2c-embedded tree"
Next in thread: Chris Friesen: "Re: CFS vs. cpufreq/cstates vs. latency"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

While tracking down a latency issue with communication between
KVM guests, we ran into a very interesting issue, an interplay
of CFS and power saving code.

About 3/4 of the 230us latency came from CPUs waking up out of
C-states. Disabling C states reduced the latency to 60us...

The issue? The communication is between various threads and
processes, each of which last ran on a CPU that is now in a
deeper C state. The total latency from that is "CPU wakeup
latency * NR CPUs woken".

This problem could be common to many different multi-threaded
or multi-process applications. It looks like something that
would be fixable at the scheduler + cpufreq level.

Specifically, waking up some process requires that the CPU
which is running the wakeup is already in C0 state. If the
CPU on which the to-be-woken task ran last is in a deep C
state, it may make sense to simply run the woken up task
on the local CPU, not the CPU where it was originally.

I seem to remember some scheduling code that (for power
saving reasons) tried running all the tasks on one CPU,
until that CPU got busy, and then spilled over onto other
CPUs.

I do not seem to be able to find that code in recent kernels,
but I have the feeling that a policy like that (related to
WAKE_AFFINE scheduling?) could improve this issue.

As an additional benefit, it has the possibility of further
improving power saving.

What do the scheduler and cpufreq people think about this
problem?

Any preferred ways to solve the "N * cpu wakeup latency"
problem that is plaguing multi-process and multi-threaded
workloads?

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Christoph Lameter: "Re: [PATCH 2/4 v2] mm: fix possible incorrect return value ofmigrate_pages() syscall"
Previous message: Mark Brown: "Re: linux-next: manual merge of the arm-soc tree with thei2c-embedded tree"
Next in thread: Chris Friesen: "Re: CFS vs. cpufreq/cstates vs. latency"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]