Re: [PATCH 1/2] boot: ignore early NMIs

From: Fernando Luis VÃzquez Cao
Date: Mon Mar 12 2012 - 21:43:49 EST


On 03/13/2012 03:40 AM, H. Peter Anvin wrote:

On 03/11/2012 11:14 PM, Fernando Luis VÃzquez Cao wrote:
The thing is that we want to avoid playing with hardware in the kdump
reboot patch when we can avoid it, the premise being that it cannot
be accessed without risking a lockup or worse (as the deadlock accessing
the I/O APIC showed). The kernel is crashing after all. What is more,
I forgot to mention that the long term goal is to leave the LAPIC
untouched too (we really want to keep the number of things we do in the
context of the crashing kernel to the bare minimum), so we would still
need to fix the early IDT.

My patch set just installs a special handler for the NMI case so I think
it is pretty simple and self contained.

Another reason to apply these patches is to be consistent with the rest
of the kernel. Spurious NMIs that would have been ignored after installing
the final IDT would cause the system to halt if they happen
to arrive while the early IDT is in place.
I'm concerned that you're adding failure modes


This patch set just brings the early IDT in line with what we do after
switching to the final IDT, i.e. we ignore NMIs. The only difference is
that we do not honor panic*nmi and unknown_nmi_panic. As things
stand now the kernel will sometimes mysteriously hang. I really
think that independently of the kdump problem it would be nice
to be consistent in this regard.

because you don't want to
solve the real problem which is you need to block this at the source.

Indeed, I want to do both. Try to block NMIs at the source when
possible and install an IDT that ignores NMIs to have our backs
covered (avoid triple faults and lockups). As Eric mentioned
it is not clear that we can always identify and stop all the sources.


It is way more than the IDT that has to work (at the very least, you
need the GDT and a working stack) at all times in order for NMIs to be
receivable.

Of course, and that is what the follow-up patch set does. It just needs
some more testing. The patches I sent make use of the early GDT and
it works as expected.

That doesn't address what happens if you're getting an NMI
storm either.

Well, the same applies to the final IDT. As I mentioned before
I think we should also try to stop things at the source when
it is safe (of course, first we need to identify all the sources).

- Fernando
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/