Re: [PATCH] Optimize int_sqrt for small values for faster idle

From: Rasmus Villemoes
Date: Mon Feb 01 2016 - 16:25:32 EST


On Thu, Jan 28 2016, Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:

> From: Andi Kleen <ak@xxxxxxxxxxxxxxx>
>
> The menu cpuidle governor does at least two int_sqrt() each time
> we go into idle in get_typical_interval to compute stddev
>
> int_sqrts take 100-120 cycles each. Short idle latency is important
> for many workloads.
>

If you want to optimize get_typical_interval(), why not just take the
square root out of the equation (literally)?

Something like

From: Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx>
Date: Mon, 1 Feb 2016 21:43:18 +0100
Subject: [PATCH] cpuidle: menu: avoid expensive square root computation

Computing the integer square root is a rather expensive operation, at
least compared to doing a 64x64 -> 64 multiply (avg*avg) and, on 64
bit platforms, doing an extra comparison to a constant (variance <=
U64_MAX/36).

On 64 bit platforms, this does mean that we add a restriction on the
range of the variance where we end up using the estimate (since
previously the stddev <= ULONG_MAX was a tautology), but on the other
hand, we extend the range quite substantially on 32 bit platforms - in
both cases, we now allow standard deviations up to 715 seconds, which
is for example guaranteed if all observations are less than 1430
seconds.

Signed-off-by: Rasmus Villemoes <linux@xxxxxxxxxxxxxxxxxx>
---
drivers/cpuidle/governors/menu.c | 35 +++++++++++++++++------------------
1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 0742b3296673..beef7ae123ba 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -200,7 +200,7 @@ static void get_typical_interval(struct menu_device *data)
{
int i, divisor;
unsigned int max, thresh;
- uint64_t avg, stddev;
+ uint64_t avg, variance;

thresh = UINT_MAX; /* Discard outliers above this value */

@@ -224,36 +224,35 @@ again:
else
do_div(avg, divisor);

- /* Then try to determine standard deviation */
- stddev = 0;
+ /* Then try to determine variance */
+ variance = 0;
for (i = 0; i < INTERVALS; i++) {
unsigned int value = data->intervals[i];
if (value <= thresh) {
int64_t diff = value - avg;
- stddev += diff * diff;
+ variance += diff * diff;
}
}
if (divisor == INTERVALS)
- stddev >>= INTERVAL_SHIFT;
+ variance >>= INTERVAL_SHIFT;
else
- do_div(stddev, divisor);
+ do_div(variance, divisor);

/*
- * The typical interval is obtained when standard deviation is small
- * or standard deviation is small compared to the average interval.
- *
- * int_sqrt() formal parameter type is unsigned long. When the
- * greatest difference to an outlier exceeds ~65 ms * sqrt(divisor)
- * the resulting squared standard deviation exceeds the input domain
- * of int_sqrt on platforms where unsigned long is 32 bits in size.
- * In such case reject the candidate average.
+ * The typical interval is obtained when standard deviation is
+ * small (stddev <= 20 us, variance <= 400 us^2) or standard
+ * deviation is small compared to the average interval (avg >
+ * 6*stddev, avg^2 > 36*variance). The average is smaller than
+ * UINT_MAX aka U32_MAX, so computing its square does not
+ * overflow a u64. We simply reject this candidate average if
+ * the standard deviation is greater than 715 s (which is
+ * rather unlikely).
*
* Use this result only if there is no timer to wake us up sooner.
*/
- if (likely(stddev <= ULONG_MAX)) {
- stddev = int_sqrt(stddev);
- if (((avg > stddev * 6) && (divisor * 4 >= INTERVALS * 3))
- || stddev <= 20) {
+ if (likely(variance <= U64_MAX/36)) {
+ if (((avg*avg > variance*36) && (divisor * 4 >= INTERVALS * 3))
+ || variance <= 400) {
if (data->next_timer_us > avg)
data->predicted_us = avg;
return;
--
2.6.1