Re: [PATCH v3 6/7] soc: qcom: Add RPMh Power domain driver

From: Ulf Hansson
Date: Sat Jun 16 2018 - 08:13:48 EST


On 15 June 2018 at 23:46, David Collins <collinsd@xxxxxxxxxxxxxx> wrote:
> Hello Ulf,
>
> On 06/15/2018 02:25 AM, Ulf Hansson wrote:
>> On 14 June 2018 at 20:17, David Collins <collinsd@xxxxxxxxxxxxxx> wrote:
>>> On 06/13/2018 11:54 PM, Rajendra Nayak wrote:
>>>> On 06/14/2018 06:02 AM, David Collins wrote:
>>>>> On 06/11/2018 09:40 PM, Rajendra Nayak wrote:
>>> ...
>>>>>> +static int rpmhpd_power_off(struct generic_pm_domain *domain)
>>>>>> +{
>>>>>> + struct rpmhpd *pd = domain_to_rpmhpd(domain);
>>>>>> + int ret = 0;
>>>>>> +
>>>>>> + mutex_lock(&rpmhpd_lock);
>>>>>> +
>>>>>> + if (pd->level[0] == 0)
>>>>>> + ret = rpmhpd_aggregate_corner(pd, 0);
>>>>>
>>>>> I'm not sure that we want to have the 'pd->level[0] == 0' check,
>>>>> especially when considering aggregation with the peer pd. I understand
>>>>> its intention to try to keep enable state and level setting orthogonal.
>>>>> However, as it stands now, the final request sent to hardware would differ
>>>>> depending upon the order of calls. Consider the following example.
>>>>>
>>>>> Initial state:
>>>>> pd->level[0] == 0
>>>>> pd->corner = 5, pd->enabled = true, pd->active_only = false
>>>>> pd->peer->corner = 7, pd->peer->enabled = true, pd->peer->active_only = true
>>>>>
>>>>> Outstanding requests:
>>>>> RPMH_ACTIVE_ONLY_STATE = 7, RPMH_WAKE_ONLY_STATE = 7, RPMH_SLEEP_STATE = 5
>>>>>
>>>>> Case A:
>>>>> 1. set pd->corner = 6
>>>>> --> new value request: RPMH_SLEEP_STATE = 6
>>>>> --> duplicate value requests: RPMH_ACTIVE_ONLY_STATE = 7,
>>>>> RPMH_WAKE_ONLY_STATE = 7
>>>>> 2. power_off pd->peer
>>>>> --> no requests
>>>>
>>>> I am not sure why there would be no requests, since we do end up aggregating
>>>> with pd->peer->corner = 0.
>>>> So the final state would be
>>>>
>>>> RPMH_ACTIVE_ONLY_STATE = max(6, 0) = 6
>>>> RPMH_WAKE_ONLY_STATE = 6
>>>> RPMH_SLEEP_STATE = max(6, 0) = 6
>>>
>>> Argh, my example was ruined by a one character typo. I meant to have:
>>>
>>> Initial state:
>>> pd->level[0] != 0
>>>
>>>
>>>>>
>>>>> Final state:
>>>>> RPMH_ACTIVE_ONLY_STATE = 7
>>>>> RPMH_WAKE_ONLY_STATE = 7
>>>>> RPMH_SLEEP_STATE = 6
>>>>>
>>>>> Case B:
>>>>> 1. power_off pd->peer
>>>>> --> no requests
>>>>
>>>> Here it would be again be aggregation based on pd->peer->corner = 0
>>>> so,
>>>> RPMH_ACTIVE_ONLY_STATE = max(5, 0) = 5
>>>> RPMH_WAKE_ONLY_STATE = 5
>>>> RPMH_SLEEP_STATE = max(5, 0) = 5
>>>>
>>>>> 2. set pd->corner = 6
>>>>> --> new value requests: RPMH_ACTIVE_ONLY_STATE = 6,
>>>>> RPMH_WAKE_ONLY_STATE = 6, RPMH_SLEEP_STATE = 6
>>>>>
>>>>> Final state:
>>>>> RPMH_ACTIVE_ONLY_STATE = 6
>>>>> RPMH_WAKE_ONLY_STATE = 6
>>>>> RPMH_SLEEP_STATE = 6
>>>>
>>>> correct,
>>>> RPMH_ACTIVE_ONLY_STATE = max(6, 0) = 6
>>>> RPMH_WAKE_ONLY_STATE = 6
>>>> RPMH_SLEEP_STATE = max(6, 0) = 6
>>>>
>>>>>
>>>>> Without the check, Linux would vote for the lowest supported level when
>>>>> power_off is called. This seems semantically reasonable given that the
>>>>> consumer is ok with the power domain going fully off and that would be the
>>>>> closest that we can get.
>>>>
>>>> So are you suggesting I replace
>>>>
>>>>>> + if (pd->level[0] == 0)
>>>>>> + ret = rpmhpd_aggregate_corner(pd, 0);
>>>>
>>>> with
>>>>
>>>>>> + ret = rpmhpd_aggregate_corner(pd, pd->level[0]);
>>>
>>> Yes, this is the modification that I'm requesting.
>>>
>>>
>>>> I can see what you said above makes sense but if its
>>>>> Initial state:
>>>>> pd->level[0] != 0
>>>>
>>>> Was that what you meant?
>>>
>>> Yes.
>>
>> Apologize for jumping into the discussion.
>>
>> I have a couple of ideas, that may simplify/improve things related to the above.
>>
>> 1) Would it be easier if genpd calls ->set_performance_state(0) before
>> it is about to call the ->power_off() callback? Actually Viresh wanted
>> that from the start, but I thought it was useless.
>
> This sounds like a relatively reasonable thing to do that falls in line
> with the semantics of the API. It would also necessitate genpd
> aggregating performance state requests again when ->power_on() is called
> so that ->set_performance_state() can be called with an appropriate value.
>
> I think that the issue that I identified above should be solved outright
> within the rpmhpd driver though. It is a bug in the RPMh-specific active
> + sleep vs active-only aggregation logic.
>
> The feature you are describing here seems more like a power optimization
> that would help in the case of consumer disabling the power domain without
> scaling down the performance level for a power domain that does not
> support enable/disable.

Right. Seems like this isn't needed then.

The genpd driver can just implement the callbacks instead.

>
> Would this behavior in genpd be implemented per power domain or per consumer?
>
>
>> 2) When device are becoming runtime suspended, the ->runtime_suspend()
>> callback of genpd is invoked (genpd_runtime_suspend()). However, in
>> there we don't care to re-evaluate the votes on the performance level,
>> but instead rely on the corresponding driver for the device to drop
>> the vote. I think it would be a good idea of managing this internally
>> in genpd instead, thus, depending on if the aggregated vote can be
>> decreased we should call ->set_performance_state(new-vote). Of
>> course, once the device get runtime resumed, the votes needs to be
>> restored for the device.
>
> This feature sounds a little risky. Is it really safe for all cases for
> the genpd framework to unilaterally make these kind of decisions for
> consumers? Could there be any interdependencies between resources that
> consumers need to enforce that would not be possible if genpd handled
> power state reduction automatically (for example two power domains with a
> sequencing requirement between them)?

In regards to resource/device dependencies, those needs to be properly
manged anyway when using runtime PM. For that we have mechanism for
dealing with parent-childs and the so called device links for
functional dependencies.

For sequencing, that may be an issue, as currently we don't propagate
performance states hierarchically inside genpd. I know Viresh is
looking at addressing this, so perhaps we should come back to this
within that context instead.

Kind regards
Uffe