Re: [PATCH v9 09/17] arm: tegra20: cpuidle: Handle case where secondary CPU hangs on entering LP2

From: Dmitry Osipenko
Date: Fri Feb 21 2020 - 15:54:14 EST


21.02.2020 23:48, Daniel Lezcano ÐÐÑÐÑ:
> On 21/02/2020 21:21, Dmitry Osipenko wrote:
>> 21.02.2020 23:02, Daniel Lezcano ÐÐÑÐÑ:
>
> [ ... ]
>
>>>>>>>> +
>>>>>>>> + /*
>>>>>>>> + * The primary CPU0 core shall wait for the secondaries
>>>>>>>> + * shutdown in order to power-off CPU's cluster safely.
>>>>>>>> + * The timeout value depends on the current CPU frequency,
>>>>>>>> + * it takes about 40-150us in average and over 1000us in
>>>>>>>> + * a worst case scenario.
>>>>>>>> + */
>>>>>>>> + do {
>>>>>>>> + if (tegra_cpu_rail_off_ready())
>>>>>>>> + return 0;
>>>>>>>> +
>>>>>>>> + } while (ktime_before(ktime_get(), timeout));
>>>>>>>
>>>>>>> So this loop will aggresively call tegra_cpu_rail_off_ready() and retry 3
>>>>>>> times. The tegra_cpu_rail_off_ready() function can be called thoushand of times
>>>>>>> here but the function will hang 1.5s :/
>>>>>>>
>>>>>>> I suggest something like:
>>>>>>>
>>>>>>> while (retries--i && !tegra_cpu_rail_off_ready())
>>>>>>> udelay(100);
>>>>>>>
>>>>>>> So <retries> calls to tegra_cpu_rail_off_ready() and 100us x <retries> maximum
>>>>>>> impact.
>>>>>> But udelay() also results into CPU spinning in a busy-loop, and thus,
>>>>>> what's the difference?
>>>>>
>>>>> busy looping instead of register reads with all the hardware things involved behind.
>>>>
>>>> Please notice that this code runs only on an older Cortex-A9/A15, which
>>>> doesn't support WFE for the delaying, and thus, CPU always busy-loops
>>>> inside udelay().
>>>>
>>>> What about if I'll add cpu_relax() to the loop? Do you think it it could
>>>> have any positive effect?
>>>
>>> I think udelay() has a call to cpu_relax().
>>
>> Yes, my point is that udelay() doesn't bring much benefit for us here
>> because:
>>
>> 1. we want to enter into power-gated state as quick as possible and
>> udelay() just adds an unnecessary delay
>>
>> 2. udelay() spins in a busy-loop until delay is expired, just like we're
>> doing it in this function already
>
> In this case why not remove ktime_get() and increase the number of retries?

Because the busy-loop performance depends on CPU's frequency, so we
can't rely on a bare number of the retries.