Re: [RFC PATCH 0/4] Support for passing runtime state idle time to TF-A

From: Sowjanya Komatineni
Date: Fri Apr 23 2021 - 18:25:15 EST



On 4/23/21 1:16 PM, Lukasz Luba wrote:
Hi Sowjanya,

On 4/22/21 9:30 PM, Sowjanya Komatineni wrote:
Tegra194 and Tegra186 platforms use separate MCE firmware for CPUs which is
in charge of deciding on state transition based on target state, state idle
time, and some other Tegra CPU core cluster states information.

Current PSCI specification don't have function defined for passing runtime
state idle time predicted by governor (based on next events and state target
residency) to ARM trusted firmware.

Do you have some numbers from experiments showing that these idle
governor prediction values, which are passed from kernel to MCE
firmware, are making a good 'guess'?
How much precision (1us? 1ms?) in the values do you need there?

it could also be in few ms depending on when next cpu event/activity might happen which is not transparent to MCE firmware.


IIRC (probably Rafael's presentations) predicting in the kernel
something like CPU idle time residency is not a trivial thing.

Another idea (depending on DT structure and PSCI bits):
Could this be solved differently, but just having a knowledge that if
the governor requested some C-state, this means governor 'predicted'
an idle residency to be greater that min_residency attached to this
C-state?
Then, when that request shows up in your FW, you know that it must be at
least min_residency because of this C-state id.
C6 is the only deepest state for Tegra194 Carmel CPU that we support in addition to C1 (WFI) idle state.

MCE firmware gets state crossover thresholds for C1 to C6 transition from TF-A and uses it along with state idle time to decide on C6 state entry based on its background work.

Assuming for now if we use min_residency as state idle time which is static value from DT, then it enters into deepest state C6 always as we use min_residency value we use is always higher than state crossover threshold.

But MCE firmware is not aware of when next cpu event can happen to predict if next event can take longer than state min_residency time.

Using min residency in such case is very conservative where MCE firmware exits C6 state early where we may not have better power saving.

But with MCE firmware being aware of when next event can happen it can use that to stay in C6 state without early exit for better power savings.

It would depend on number of available states, max_residency, scale
that you would choose while assigning values from [0, max_residency]
to each state.
IIRC there can be many state IDs for idle, so it would depend on
number of bits encoding this state, and your needs. Example of
linear scale:
4-bits encoding idle state and max predicted residency 10msec,
that means 10000us / 16 states = 625us/state.
The max_residency might be split differently, using different than
linear function, to have some rage more precised.

Open question is if these idle states must be all represented
in DT, or there is a way of describing a 'set of idle states'
automatically.
We only support C6 state through DT as C6 is the only deepest state for Tegra194 carmel CPU. WFI idle state is completely handled by kernel and does not require MCE sequences for entry/exit.

Regards,
Lukasz