Re: [patch] i386: halt the CPU on serious errors

From: Andreas Mohr
Date: Tue Jun 20 2006 - 03:49:06 EST


Hi,

On Tue, Jun 20, 2006 at 12:55:25AM -0400, Chuck Ebbert wrote:
> Halt the CPU when serious errors are encountered and we
> deliberately go into an infinite loop.
>
> Suggested by Andreas Mohr.

Thanks for the very fast patch, but:

> /* Assume hlt works */
> - halt();
> - for(;;);
> + for (;;)
> + halt();

The comment above seems to hint at this code in arch/i386/kernel/smp.c/
stop_this_cpu() which does proper hlt checks:

if (cpu_data[smp_processor_id()].hlt_works_ok)
for(;;) halt();
for (;;);

So I'm unsure what happens if hlt is not supported properly. Maybe
there's an invalid opcode exception in a loop then.

I think a patch should do the following (or more):

- try to group various CPU emergency stop places together
- comment about trying to avoid overheating, mentioning ACPI
thermal protection specifications and possibly *missing protection*
in non-ACPI systems/modes (APM!)
- if (hlt_works_ok), do hlt loop
- if not, do a cpu_relax() loop (or even: for (;;) 3 times cpu_relax()
for lower activity?)

One thing that may be worth pondering about is whether using cpu_relax()
might actually keep the CPU slightly below the ACPI shutdown temperature
limit and thus do *more* harm with a broken fan. In that case we might
want to check for active ACPI mode and if active do a busy loop instead.
This however may cause temperature to cycle (will a CPU shutdown reactivate
itself after cooling down again?), so a cpu_relax() might still be better.

Andreas Mohr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/