Re: [PATCH] thermal: core: Add a back up thermal shutdown mechanism

From: Tero Kristo
Date: Wed Apr 12 2017 - 14:43:20 EST


On 12/04/17 20:24, Eduardo Valentin wrote:
On Wed, Apr 12, 2017 at 10:41:00PM +0530, Keerthy wrote:


On Wednesday 12 April 2017 10:38 PM, Grygorii Strashko wrote:


On 04/12/2017 11:44 AM, Keerthy wrote:


On Wednesday 12 April 2017 10:01 PM, Grygorii Strashko wrote:


On 04/12/2017 10:44 AM, Eduardo Valentin wrote:
Hello,

...


I agree. But there it nothing that says it is not reenterable. If you
saw something in this line, can you please share?

will you generate a patch to do this?
Sure. I will generate a patch to take care of 1) To make sure that
orderly_poweroff is called only once right away. I have already
tested.

for 2) Cancel all the scheduled work queues to monitor the
temperature.
I will take some more time to make it and test.

Is that okay? Or you want me to send both together?

I think you can send patch for step 1 first.

I am happy to see that Keerthy found the problem with his setup and a
possible solution. But I have a few concerns here.

1. If regular shutdown process takes 10seconds, that is a ballpark that
thermal should never wait. orderly_poweroff() calls run_cmd() with wait
flag set. That means, if regular userland shutdown takes 10s, we are
waiting for it. Obviously this not acceptable. Specially if you setup
critical trip to be 125C. Now, if you properly size the critical trip to
fire before hotspot really reach 125C, for 10s (or the time it takes to
shutdown), then fine. But based on what was described in this thread,
his system is waiting 10s on regular shutdown, and his silicon is on
out-of-spec temperature for 10s, which is wrong.

2. The above scenario is not acceptable in a long run, specially from a
reliability perspective. If orderly_poweroff() has a possibility to
simply never return (or take too long), I would say the thermal
subsystem is using the wrong API.

^ this question just repeat everything which was already discussed in
previous versions of this patch - orderly_poweroff() is not good for critical shutdown/poweroff,
but what to use instead?

It is still useful on a properly sized system. The point is the thermal
subsystem still wants to give one opportunity to gracefully shutdown the
running system on a thermal scenario, as I explained in the other email.
But, you have to do this accounting the down time, and your reliability
concerns.






Hh, I do not see that orderly_poweroff() will wait for anything now:
void orderly_poweroff(bool force)
{
if (force) /* do not override the pending "true" */
poweroff_force = true;
schedule_work(&poweroff_work);
^^^^^^^ async call. even here can be pretty big delay if system is under pressure
}


static int __orderly_poweroff(bool force)
{
int ret;

ret = run_cmd(poweroff_cmd);

When i tried with multiple orderly_poweroff calls ret was always 0.
So every 250mS i see this ret = 0.

^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC

if (ret && force) {

So it never entered this path. ret = 0 so if is not executed.

correct, because exec can find poweroff tool and start it, so you,
most probably, have bunch of this tool instance running in parallel (some of them can fail or block)
Issue 1 - you've sent fix for is actual :).

Precisely yes!


As I mentioned, the fix is a two fold, a. avoid spam of
orderly_poweroff(), but make sure eventually we shutdown.

Just chirping in here a bit myself also, the long latencies in the poweroff executing are basically because in our case it will do all of the following:

- stop all running daemons
- kill all remaining processes
- unload all modules
- sync / unmount all filesystems
- etc.
- poweroff the system when everything else has been gracefully done

The order of these things are not necessarily what I listed above, but overall it takes quite a bit of time. It doesn't matter if you execute all of this over NFS or SD card or ramdisk, it is a long procedure.

-Tero



Again, thermal has no control of power off process once run_cmd() is returned,
and it do not know what US poweroff binary is doing and how much time can it take
(which include disks maintenance - loooong delay).


pr_warn("Failed to start orderly shutdown: forcing the issue\n");

/*
* I guess this should try to kick off some daemon to sync and
* poweroff asap. Or not even bother syncing if we're doing an
* emergency shutdown?
*/
emergency_sync();
kernel_power_off();
^^^ force power off, but only if run_cmd() failed - for example /sbin/poweroff doesn't exist
}

return ret;
}

static bool poweroff_force;

static void poweroff_work_func(struct work_struct *work)
{
__orderly_poweroff(poweroff_force);
}

As result thermal has no control of power off any more after calling orderly_poweroff() and can get the result
of US poweroff binary execution.


If you are going to implement the above two patches, keep in mind:
i. At least within the thermal subsystem, you need to take care of all
zones that could trigger a shutdown.
ii. serializing the calls to orderly_poweroff() seams to be more
concerning than cancelling all monitoring.