Re: [PATCH -v2 7/7] x86, NMI, Remove do_nmi_callback logic

From: huang ying
Date: Wed Sep 29 2010 - 02:56:05 EST


Hi, Don,

On Tue, Sep 28, 2010 at 11:19 PM, Don Zickus <dzickus@xxxxxxxxxx> wrote:
>> If NMI comes from watchdog, nmi_watchdog_tick() will return 1. So
>> do_nmi_callback() is NOT for watchdog NMI, but for unknown NMI. Why do
>> we call DIE_NMIWATCHDOG for unknown NMI (NOT watchdog NMI)? die_nmi is
>> for watchdog, not unknown NMI.
>
> I think watchdog is an overloaded term. ÂI was under the impression that
> once the nmi watchdog determined a problem, it called the DIE_NMIWATCHDOG
> die chain to see if any other drivers wanted to clean up or do their thing
> first before panic'ing (namely drivers in drivers/char/watchdog).

Yes. I think so too. And in original code, almost all DIE_NMIxxx is
used in this way:

DIE_NMI is called after read port 0x61, to see if any other driver
wanted to recover the error notified based on reason read from port
0x61.

DIE_NMIWATCHDOG is used to see if any other drivers wanted to clean up
or do their thing before panic

DIE_NMIUNKNOWN is used to see if any other driver wanted to clean up
or debug before default unknown logic (such as panic).

DIE_NMI_IPI is used to see if any driver want to process the NMI (sent
via APIC? Maybe named after that).

So the original implementation of defualt_do_nmi() is:

- determine the reason/source of NMI in default_do_nmi(). Although the
exact reason/source is not determined, such as perf.

- notify_die() for corresponding NMI reason/source, to see if any
driver want to process this instead of the default operation

- If no other driver processed it, call default operation, such as
panic for DIE_NMIUNKNOWN.


The original implementation need to be changed, because it only uses
port 0x61 to determine the reason/source of NMI. We need a order based
scheme to determine the reason/source of NMI. The order is as follow:

CPU-specific (CPU local) NMI
non-CPU-specific (global) NMI
port 0x61
NMI Watchdog

I think we all agree that to use order to determine the reason/source
of NMI. The difference is that I want to keep as many direct calls in
default_do_nmi() as possible, while you guys want to wrap almost all
code in default_do_nmi() into notifier handler and leave only one
notify_die() in defualt_do_nmi(). And I want to use different die_val
(and their calling order in default_do_nmi()) to determine the order
while you guys want to use priority (based on its value) to determine
the order.

On the other hand, I think we should call corresponding DIE_NMIxxx
before the default operations, such as for watchdog, call
DIE_NMIWATCHDOG before go panic, for unknown nmi, call DIE_NMIUNKNOWN
before the default processing (may panic).

I think it is important to distinguish between die chain used to
determine the source/reason of NMI and the die chain used to see if
any other driver wanted to do some processing before the default
operation.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/