Re: Why reassert ix86 NMI?

From: Richard B. Johnson (root@chaos.analogic.com)
Date: Tue Jun 06 2000 - 16:29:09 EST


On Tue, 6 Jun 2000, Maciej W. Rozycki wrote:

> On Tue, 6 Jun 2000, Richard B. Johnson wrote:
>
> > Not correct. If you send a bad descriptor to INT 0x15, function 0x87,
> > the processor will reset (crash). The shutdown-byte is checked early
>
> I can't recall Linux using INT 0x15, function 0x87... This is
> irrelevant.
>

LILO uses it to load the kernel above the "1 megabyte boundary".
This is, therefore, quite relevant.
 
> > Now, imagine what happens if the machine was just started from a
> > cold-boot and 0x09 was found in the CMOS shutdown byte. This will
> > result in a crash requiring an on-site reset. The reset will work,
> > because POST will have changed the shutdown byte to 0 before it
> > did the '0x09 thing'. However, you need to physically be there.
>
> If you start from a cold-boot, the BIOS checks for a cold startup
> condition in the 8042 before checking the shutdown byte. It's usually
> just after executing a far jump at 0xf000:0xfff0. There is only a smsw

The BIOS for most new processors, i.e., after the 486/DX does not do
a far jump at 0xf000:fff0. What you see there with `debug` is after
the BIOS has been shadowed. That vector usually points to some low-
memory code before jumping to the previous vector.

Upon startup, the BIOS does _not_ exists at 0xF000:0000, with the
reset-vector at 0xF000:FFF0!!

Instead it exists at absolute address 0xFFFFFFF0. This is called the reset
vector. Even though the processor is in real-mode, the 32-bit CS address-
ing works because the CS internal descriptor cache remains with all
bits set high.

This condition continues as long as only 'near' jumps occur. Therefore,
the reset vector contains a near jump to the code that determines how
the processor reset occured. If it was a hard reset, the BIOS PROM (NVRAM)
has to be enabled at 0xF000:0000 before a far jump occurs. This far jump
is, therefore, never at 0xF000:FFF0, even though it's there by the time
you look at it with `debug`.

After the BIOS is shadowed, a 'warm-boot' vector is set at 0xF000:FFF0.
This is the 'far' jump you see with debug. This is where DOS (and maybe
Windows) goes when you hit Ctrl-Alt-Del.

Incidentally, the PCI specification requires that all BIOS modules
run in RAM, not PROM. Therefore most everything you see with debug
has the potential of being somewhat different than what is actually
burned into PROM (or NVRAM) and different than what executed upon
power-up.

> in-between to check the BIOS is really in the real mode. For a cold
> startup the shutdown byte is ignored and the POST is performed. Zeroing
> of the shutdown byte is a standard step of the POST.
>

A processor reset from protected mode does not affect the cold-start
status of the keyboard controller. A power failure or the power switch
results in undefined status in the keyboard controller. That's why it
is reset/reprogrammed if a check of the CMOS shutdown byte shows that
the reset was not a processor-only reset from protected mode. Note
that the keyboard controller is a uP with mask-ROM code. It takes
time for it to come out of reset during power-up. That's why its
status is undefined.

Hitting the reset switch does affect the keyboard controller. Since
it is possible for the machine to be in the protected-mode copy
routine when somebody hits the reset switch, the keyboard controller
status is checked after the CMOS check. This picks up the reset
operation. Ctrl-Alt-Del can't affect protected-mode copy because
the interrupts are disabled during this operation.
 
> If your BIOS does not perform like this, complain to your vendor -- it's
> a standard sequence since 80286.
>

I am a BIOS 'vendor'.

> > That is the purpose of the sequence. It is hardly dangerous. Its
> > specific purpose is to cause another NMI if the latch is still set.
>
> Surely -- the problem is it cannot be easily done without a side effect
> of reprogramming the index.
>

It does not reprogram the index. The high-bit of the index is the
NMI enable bit. An index of 0x80 == 0x00 for the CMOS.
 
> > You could have the time be off by a maximum of 59 seconds. Since
> > it's BCD, there is an additional probability consideration since
> > the registers will mask (not wrap) to '99', which is '99' - '59' ='40'
> > so with all bits set on a crash, it's most likely that the time will
> > be off by 40 seconds.
>
> If you use the RTC as a fallback NTP reference clock you may make your
> network clients unhappy.
>

Sure., but it's better than not booting until somebody hits the reset
button. The whole idea is to make sure that, should power fail, the
system will reboot when power is restored. You help do this by making
sure the shutdown byte cannot be corrupted during the power failure.

Cheers,
Dick Johnson

Penguin : Linux version 2.3.41 on an i686 machine (800.63 BogoMips).

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed Jun 07 2000 - 21:00:26 EST