Re: [PATCH 1/2] boot: ignore early NMIs

From: Eric W. Biederman
Date: Mon Mar 12 2012 - 15:57:50 EST


"H. Peter Anvin" <hpa@xxxxxxxxx> writes:

> On 03/11/2012 11:14 PM, Fernando Luis VÃzquez Cao wrote:
>>
>> The thing is that we want to avoid playing with hardware in the kdump
>> reboot patch when we can avoid it, the premise being that it cannot
>> be accessed without risking a lockup or worse (as the deadlock accessing
>> the I/O APIC showed). The kernel is crashing after all. What is more,
>> I forgot to mention that the long term goal is to leave the LAPIC
>> untouched too (we really want to keep the number of things we do in the
>> context of the crashing kernel to the bare minimum), so we would still
>> need to fix the early IDT.
>>
>> My patch set just installs a special handler for the NMI case so I think
>> it is pretty simple and self contained.
>>
>> Another reason to apply these patches is to be consistent with the rest
>> of the kernel. Spurious NMIs that would have been ignored after installing
>> the final IDT would cause the system to halt if they happen
>> to arrive while the early IDT is in place.
>>
>
> I'm concerned that you're adding failure modes because you don't want to
> solve the real problem which is you need to block this at the source.
> It is way more than the IDT that has to work (at the very least, you
> need the GDT and a working stack) at all times in order for NMIs to be
> receivable. That doesn't address what happens if you're getting an NMI
> storm either.

Good criticism.

The basic problem is what do we do when we receive NMIs during the
kernel boot. Dying mysteriously certainly isn't a good solution.

In the kexec on panic code path we already have a stack and as long
as we can fit the GDT and the LDT on that same page we can have all of
the rest during the entire transition.

After that is basically the kernel's boot code.

The basic problem is which source do we block this at? How many
sources are their? And architecturally last I looked x86 no longer
has a NMI disable EFI and similar systems want to get away without
a CMOS legacy clock because designers so often get them wrong.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/