Re: [PATCH v9 02/10] reboot: Add hardware protection power-off

From: Petr Mladek
Date: Wed May 12 2021 - 04:20:29 EST


On Mon 2021-05-10 14:28:30, Matti Vaittinen wrote:
> There can be few cases when we need to shut-down the system in order to
> protect the hardware. Currently this is done at east by the thermal core
> when temperature raises over certain limit.
>
> Some PMICs can also generate interrupts for example for over-current or
> over-voltage, voltage drops, short-circuit, ... etc. On some systems
> these are a sign of hardware failure and only thing to do is try to
> protect the rest of the hardware by shutting down the system.
>
> Add shut-down logic which can be used by all subsystems instead of
> implementing the shutdown in each subsystem. The logic is stolen from
> thermal_core with difference of using atomic_t instead of a mutex in
> order to allow calls directly from IRQ context.
>
> Signed-off-by: Matti Vaittinen <matti.vaittinen@xxxxxxxxxxxxxxxxx>
>
> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index a6ad5eb2fa73..5da8c80a2647 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -518,6 +519,85 @@ void orderly_reboot(void)
> }
> EXPORT_SYMBOL_GPL(orderly_reboot);
>
> +/**
> + * hw_failure_emergency_poweroff_func - emergency poweroff work after a known delay
> + * @work: work_struct associated with the emergency poweroff function
> + *
> + * This function is called in very critical situations to force
> + * a kernel poweroff after a configurable timeout value.
> + */
> +static void hw_failure_emergency_poweroff_func(struct work_struct *work)
> +{
> + /*
> + * We have reached here after the emergency shutdown waiting period has
> + * expired. This means orderly_poweroff has not been able to shut off
> + * the system for some reason.
> + *
> + * Try to shut down the system immediately using kernel_power_off
> + * if populated
> + */
> + WARN(1, "Hardware protection timed-out. Trying forced poweroff\n");
> + kernel_power_off();

WARN() look like an overkill here. It prints many lines that are not
much useful in this case. The function is called from well-known
context (workqueue worker).

Also be aware that "panic_on_warn" commandline option will trigger
panic() here.


> + /*
> + * Worst of the worst case trigger emergency restart
> + */
> + WARN(1,
> + "Hardware protection shutdown failed. Trying emergency restart\n");
> + emergency_restart();

Two consecutive WARN() calls are even less useful. They are eye
catching but it is hard to find the only useful line with
the custom message.

Best Regards,
Petr