Re: 64bit x86: NMI nesting still buggy?

From: Steven Rostedt
Date: Tue Apr 29 2014 - 15:16:12 EST


On Tue, 29 Apr 2014 20:48:34 +0200 (CEST)
Jiri Kosina <jkosina@xxxxxxx> wrote:

> On Tue, 29 Apr 2014, Steven Rostedt wrote:
>
> > > Just to be clear here -- I don't have a box that can reproduce this; I
> > > whole-heartedly believe that even if there are boxes with this behavior
> > > (and I assume there are, otherwise Intel wouldn't be mentioning it in the
> > > docs), it'd be hard to trigger on those.
> >
> > I see your point. But it is documented for those that control both NMIs
> > and SMMs. As it says in the document: "If the SMI handler requires the
> > use of NMI interrupts". That to me sounds like a system that has
> > control over both SMIs *and* NMIs. The BIOS should not have any control
> > over NMIs, as the OS requires that. And the OS has no control over
> > SMIs.
> >
> > That paragraph sounds irrelevant to normal BIOS and OS systems as
> > neither "owns" both SMIs and NMIs.
>
> Which doesn't really help me being less nervous about this whole thing.
>
> I don't believe Intel would put a completely arbitrary and nonsencial
> paragraph into the manual all of a sudden. It'd be great to know the
> rationale why this has been added in the first place.

Honestly, it doesn't seem to be stating policy, it seems to be stating
"what happens if I do this". Again, BIOS writers need to be more
concern about what the OS might need. They should not be changing the
way NMIs work from under the covers. The OS has no protection from this
at all. Just like the bug I had reported where the BIOS writers caused
the second PIT to get corrupted. The bug was on their end.

>
> > > We were hunting something completely different, and came through this
> > > paragraph in the Intel manual, and found it rather scary.
> >
> > But this is all irrelevant anyway as this is all hypothetical and
> > there's been no real world bug with this.
>
> One would hope. Again -- I believe if this would trigger here and here a
> few times a year, everyone would probably atribute it to a "random hang",
> reboot, and never see the bug again.
>

I highly doubt it. It would cause issues on all the systems that run an
NMI watch dog. There's enough out there that a random hang will raise
an eyebrow.

And it would trigger much more often on systems that don't do the
tricks we do with my changes. There's a lot of them out there too.

I wouldn't be losing any sleep over this.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/