RE: [RFC/RFT][PATCH v3] cpuidle: New timer events oriented governor for tickless systems

From: Doug Smythies
Date: Sun Nov 11 2018 - 22:49:03 EST


On 2018.11.10 13:48 Doug Smythies wrote:
> On 2018.11.07 09:04 Doug Smythies wrote:
>
>> The Phoronix dbench test was run under the option to run all
>> the tests, instead of just one number of clients. This was done
>> with a reference/baseline kernel of 4.20-rc1, and also with this
>> TEO version 3 patch. The tests were also repeated with trace
>> enabled for 5000 seconds. Idle information and processor
>> package power were sampled once per minute in all test runs.
>>
>> The results are:
>> http://fast.smythies.com/linux-pm/k420/k420-dbench-teo3.htm
>> http://fast.smythies.com/linux-pm/k420/histo_compare.htm
>
> Another observation from the data, and for the reference/
> baseline 4.20-rc1 kernel, idle state 0 histogram plots
> is that there are several (280) long idle durations.
>
> For unknown reasons, these are consistently dominated by
> CPU 5 on my system (264 of the 280, in this case).
>
> No other test that I have tried shows this issue,
> But other tests also tend to set the need_resched
> flag whereas the dbench test doesn't.
>
> Older kernels also have the issue. I tried: 4.19, 4.18
> 4.17, 4.16+"V9" idle re-work patch set of the time.
> There is no use going back further, because "V9" was
> to address excessively long durations in shallow idle states.
>
> I have not made progress towards determining the root issue.

For whatever reason, sometimes a softirq_entry occurs and it is
a considerably longer than usual time until the softirq_exit.
Sometimes there are bunch of interrupts piled up, and sometimes
some delay between the softirq_entry and anything else happening,
which perhaps just means I didn't figure out what else to enable
in the trace to observe it.

Anyway, it seems likely that there is no problem here after all.

... Doug