Re: [git pull] machine check recovery fix

From: Linus Torvalds
Date: Thu May 17 2012 - 18:45:57 EST


On Thu, May 17, 2012 at 10:10 AM, Luck, Tony <tony.luck@xxxxxxxxx> wrote:
> Linus: Sent this to you on Monday as a patch:

So I really didn't like the patch.

I'm not entirely sure why I dislike it so much, but I don't like how
it seems to mix up the software rules and the hardware rules. They are
two totally separate things.

Also, the whole "nonrestartable state flag" means - if I understood
things correctly - that you really cannot do the "iret" even from the
NMI handler. So trying to push this into the whole process
notification seems entirely incorrect, because that still requires
that we return from the NMI - using the very machine state that we're
not supposed to use.

So I seriously believe the patch is wrong.

What I think *could* be right is something that says

- if the "can't restart" flag is set *AND* the state saved is
user-space, then we can treat the NMI as a regular interrupt (because
we're clearly not interrupting kernel mode), and we can kill the
process directly.

- if "can't restart" is set, and we're in kernel mode, we need to
panic (or, perhaps, just say "screw it, we don't have any choice,
we're going to try to restart anyway")

I guess the notify bit kind of emulates that "if the NMI happened in
user space" thing, but it seems to really do that more by mistake than
by design. Or at least it doesn't seem to be explicitly documented as
being intentional.

I dunno. I'm just very uncomfortable with the patch.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/