Re: [PATCH] thermal: core: Add a back up thermal shutdown mechanism

From: Eduardo Valentin
Date: Wed Apr 12 2017 - 12:55:09 EST


Keerthy,

On Wed, Apr 12, 2017 at 10:14:36PM +0530, Keerthy wrote:
>
>
> On Wednesday 12 April 2017 10:01 PM, Grygorii Strashko wrote:
> >
> >
> > On 04/12/2017 10:44 AM, Eduardo Valentin wrote:
> >> Hello,
> >>
> > ...
> >
> >>
> >> I agree. But there it nothing that says it is not reenterable. If you
> >> saw something in this line, can you please share?
> >>
> >>>>> will you generate a patch to do this?
> >>>> Sure. I will generate a patch to take care of 1) To make sure that
> >>>> orderly_poweroff is called only once right away. I have already
> >>>> tested.
> >>>>
> >>>> for 2) Cancel all the scheduled work queues to monitor the
> >>>> temperature.
> >>>> I will take some more time to make it and test.
> >>>>
> >>>> Is that okay? Or you want me to send both together?
> >>>>
> >>> I think you can send patch for step 1 first.
> >>
> >> I am happy to see that Keerthy found the problem with his setup and a
> >> possible solution. But I have a few concerns here.
> >>
> >> 1. If regular shutdown process takes 10seconds, that is a ballpark that
> >> thermal should never wait. orderly_poweroff() calls run_cmd() with wait
> >> flag set. That means, if regular userland shutdown takes 10s, we are
> >> waiting for it. Obviously this not acceptable. Specially if you setup
> >> critical trip to be 125C. Now, if you properly size the critical trip to
> >> fire before hotspot really reach 125C, for 10s (or the time it takes to
> >> shutdown), then fine. But based on what was described in this thread,
> >> his system is waiting 10s on regular shutdown, and his silicon is on
> >> out-of-spec temperature for 10s, which is wrong.
> >>
> >> 2. The above scenario is not acceptable in a long run, specially from a
> >> reliability perspective. If orderly_poweroff() has a possibility to
> >> simply never return (or take too long), I would say the thermal
> >> subsystem is using the wrong API.
> >>
> >
> >
> > Hh, I do not see that orderly_poweroff() will wait for anything now:
> > void orderly_poweroff(bool force)
> > {
> > if (force) /* do not override the pending "true" */
> > poweroff_force = true;
> > schedule_work(&poweroff_work);
> > ^^^^^^^ async call. even here can be pretty big delay if system is under pressure
> > }
> >
> >
> > static int __orderly_poweroff(bool force)
> > {
> > int ret;
> >
> > ret = run_cmd(poweroff_cmd);
>
> When i tried with multiple orderly_poweroff calls ret was always 0.
> So every 250mS i see this ret = 0.
>
> > ^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC
> >
> > if (ret && force) {
>
> So it never entered this path. ret = 0 so if is not executed.

I think your setup has two major problems then:
1. when kernel runs userspace power off, it execs properly, in fact, it
is not triggered.
2. when you finally exec it, it takes 5s to finish.

If this is correct, I think my suggestions on the other email
still holds.

BR,

Attachment: signature.asc
Description: Digital signature