Re: [ELISA Safety Architecture WG] [PATCH v2 0/2] Introduce the pkill_on_warn parameter
From: James Bottomley
Date: Tue Nov 16 2021 - 08:21:06 EST
On Tue, 2021-11-16 at 09:41 +0100, Petr Mladek wrote:
[...]
> If I wanted to implement a super-reliable panic() I would
> use some external device that would cause power-reset when
> the watched device is not responding.
They're called watchdog timers. We have a whole subsystem full of
them:
drivers/watchdog
We used them in old cluster HA systems to guarantee successful recovery
of shared state from contaminated cluster members, but I think they'd
serve the reliable panic need equally well. Most server class systems
today have them built in (on the BMC if they don't have a separate
mechanism), they're just not usually activated.
James