RE: [RFC/RFT][PATCH v3 0/6] sched/cpuidle: Idle loop rework

From: Doug Smythies
Date: Sat Mar 10 2018 - 11:07:51 EST


On 2018.03.10 01:00 Rafael J. Wysocki wrote:
> On Saturday, March 10, 2018 8:41:39 AM CET Doug Smythies wrote:
>>
>> With apologies to those that do not like the term "PowerNightmares",
>
> OK, and what exactly do you count as "PowerNightmares"?

I'll snip some below and then explain:

...[snip]...

>>
>> Kernel 4.16-rc4: Summary: Average processor package power 27.41 watts
>>
>> Idle State 0: Total Entries: 9096 : PowerNightmares: 6540 : Not PN time (seconds): 0.051532 : PN time: 7886.309553 : Ratio:
153037.133492
>> Idle State 1: Total Entries: 28731 : PowerNightmares: 215 : Not PN time (seconds): 0.211999 : PN time: 77.395467 : Ratio:
365.074679
>> Idle State 2: Total Entries: 4474 : PowerNightmares: 97 : Not PN time (seconds): 1.959059 : PN time: 0.874112 : Ratio: 0.446190
>> Idle State 3: Total Entries: 2319 : PowerNightmares: 0 : Not PN time (seconds): 1.663376 : PN time: 0.000000 : Ratio: 0.000000

O.K. let's go deeper than the summary, and focus on idle state 0, which has been my area of interest in this saga.

Idle State 0:
CPU: 0: Entries: 2093 : PowerNightmares: 1136 : Not PN time (seconds): 0.024840 : PN time: 1874.417840 : Ratio: 75459.655439
CPU: 1: Entries: 1051 : PowerNightmares: 721 : Not PN time (seconds): 0.004668 : PN time: 198.845193 : Ratio: 42597.513425
CPU: 2: Entries: 759 : PowerNightmares: 583 : Not PN time (seconds): 0.003299 : PN time: 1099.069256 : Ratio: 333152.246028
CPU: 3: Entries: 1033 : PowerNightmares: 1008 : Not PN time (seconds): 0.000361 : PN time: 1930.340683 : Ratio: 5347203.995237
CPU: 4: Entries: 1310 : PowerNightmares: 1025 : Not PN time (seconds): 0.006214 : PN time: 1332.497114 : Ratio: 214434.682950
CPU: 5: Entries: 1097 : PowerNightmares: 848 : Not PN time (seconds): 0.005029 : PN time: 785.366864 : Ratio: 156167.601340
CPU: 6: Entries: 1753 : PowerNightmares: 1219 : Not PN time (seconds): 0.007121 : PN time: 665.772603 : Ratio: 93494.256958

Note: CPU 7 is busy and doesn't go into idle at all.

And also look at the histograms of the times spent in idle state 0:
CPU 3 might be the most interesting:

Idle State: 0 CPU: 3:
4 1
5 3
7 2
11 1
12 1
13 2
14 3
15 3
17 4
18 1
19 2
28 2
7563 1
8012 1
9999 1006

Where:
Column 1 is the time in microseconds it was in that idle state
up to 9999 microseconds, which includes anything more.
Column 2 is the number of occurrences of that time.

Notice that 1008 times out of the 1033, it spent an excessive amount
of time in idle state 0, leading to excessive power consumption.
I adopted Thomas Ilsche's "Powernightmare" term for this several
months ago.

This CPU 3 example was pretty clear, but sometimes it is not so
obvious. I admit that my thresholds for is it a "powernigthmare" or
not are somewhat arbitrary, and I'll change them to whatever anybody
wants. Currently:

#define THRESHOLD_0 100 /* Idle state 0 PowerNightmare threshold in microseconds */
#define THRESHOLD_1 1000 /* Idle state 1 PowerNightmare threshold in microseconds */
#define THRESHOLD_2 2000 /* Idle state 2 PowerNightmare threshold in microseconds */
#define THRESHOLD_3 4000 /* Idle state 3 PowerNightmare threshold in microseconds */

Let's have a look at another example from the same test run:

Idle State 1:
CPU: 0: Entries: 3104 : PowerNightmares: 41 : Not PN time (seconds): 0.012196 : PN time: 10.841577 : Ratio: 888.945312
CPU: 1: Entries: 2637 : PowerNightmares: 40 : Not PN time (seconds): 0.013135 : PN time: 11.334686 : Ratio: 862.937649
CPU: 2: Entries: 1618 : PowerNightmares: 41 : Not PN time (seconds): 0.008351 : PN time: 10.193641 : Ratio: 1220.649147
CPU: 3: Entries: 10180 : PowerNightmares: 31 : Not PN time (seconds): 0.087596 : PN time: 14.748787 : Ratio: 168.372836
CPU: 4: Entries: 3878 : PowerNightmares: 22 : Not PN time (seconds): 0.040360 : PN time: 14.207233 : Ratio: 352.012710
CPU: 5: Entries: 3658 : PowerNightmares: 1 : Not PN time (seconds): 0.031188 : PN time: 0.604176 : Ratio: 19.372066
CPU: 6: Entries: 3656 : PowerNightmares: 39 : Not PN time (seconds): 0.019173 : PN time: 15.465367 : Ratio: 806.622179

Idle State: 1 CPU: 2:
0 230
1 566
2 161
3 86
4 61
5 13
6 32
7 37
8 42
9 41
10 4
11 41
12 38
13 24
14 27
15 26
16 5
17 21
18 16
19 17
20 15
21 1
22 12
23 17
24 16
25 11
26 2
27 5
28 5
29 3
35 1
47 1
1733 1
1850 1
2027 1
3929 1
9999 37

The 41 "Powernightmares" out of 1618 seems correct to me.
Even if someone claims that the threshold should have been >3929 uSec,
there are still 37 "powenightmares".

>>>
>> Graph of package power verses time: http://fast.smythies.com/rjwv3_100.png
>
> The graph actually shows an improvement to my eyes, as the blue line is quite
> consistently above the red one except for a few regions (and I don't really
> understand the drop in the blue line by the end of the test window).

Agreed, it shows improvement, as does the average package power.

The roughly 22 minute drop in the reference test, unmodified kernel 4.16-rc4,
the blue line, is something that has been driving me crazy since the start
of this work and is the reason my tests run for so long (even 4 hours is short).
It all has to do with subtle timing changes of events. Sometimes events lineup
in such a way that there are lots of these "powernightmares" and sometimes
events lineup in such way that there are not. These conditions can persist for
hours at a time. By "events" I mean ones that ask for a resched upon exit and ones
that don't (I think, I am going from memory here and haven't found my notes).
I have the trace data and can look in more detail if you want.

... Doug