[PATCH V2 0/6] Enable deferred error interrupts

From: Aravind Gopalakrishnan
Date: Wed May 06 2015 - 13:02:04 EST


Deferred errors indicate error conditions that were not corrected, but
require no action from S/W (or action is optional).These errors provide
info about a latent UC MCE that can occur when a poisoned data is
consumed by the processor.

Newer AMD processors can generate deferred errors and can be configured
to generate APIC interrupts on such events.

This patchset introduces a new interrupt handler for deferred errors and
configures the HW if the feature is present.

Patch1: Factor out logging mechanism so we can reuse for deferred errors
No functional change.
Patch2: Read MCx_ADDR(bank) before calling mce_log(). This fixes an issue
as currently, amd_decode_mce() will always only print error address
as 0x0 even if a valid address exists.
Patch3: Defines SUCCOR cpuid bit. This indicates prescence of features
such as data poisoning and deferred error interrupts in hardware.
Patch4: Implement the interrupt handler.
- setup vector number, build the interrupt and implement handler
function in this patch.
Patch5, Patch 6: Cleanups in the code. No functional changes are introduced.

Changes from V1:
- Two Prepatches-
* Factor out logging mechanism so we can reuse for deferred errors
* Read MCx_ADDR(bank) before calling mce_log() so we get relevant
error address printed out on kernel logs
- Providing short description of Deferred errors here as well as in commit
message of patch2 (per Ingo, Boris)
- Adding comments around mce_flags to define the bitfields better (per Boris)
- Assign truth values using double negation and 'BIT' macros. Vertically
align statements while at it. (per Boris)
- Change definitions of 'deferred_interrupt' to 'deferred_error_interrupt';
DEFERRED_APIC_VECTOR to DEFERRED_ERROR_VECTOR and irq_deferred_count
to irq_deferred_error_count (per Andy, Boris)
- Do the BIOS workaround check for all families as we are behind a cpuid
bit anyway. And print a FW_BUG message as needed. (per Boris)
- Updating the timestamp of patch to May 2015 in mce_amd.c

Aravind Gopalakrishnan (6):
x86/MCE/AMD: Factor out logging mechanism
x86/MCE/AMD: Read MCx_ADDR(bank) before we log the error
x86/mce: Define 'SUCCOR' cpuid bit
x86/MCE/AMD: Introduce deferred error interrupt handler
x86, irq: Cleanup ordering of vector numbers
x86/MCE/AMD: Rename setup_APIC_mce

arch/x86/include/asm/entry_arch.h | 3 +
arch/x86/include/asm/hardirq.h | 3 +
arch/x86/include/asm/hw_irq.h | 2 +
arch/x86/include/asm/irq_vectors.h | 11 +--
arch/x86/include/asm/mce.h | 20 ++++-
arch/x86/include/asm/trace/irq_vectors.h | 6 ++
arch/x86/include/asm/traps.h | 3 +-
arch/x86/kernel/cpu/mcheck/mce.c | 3 +-
arch/x86/kernel/cpu/mcheck/mce_amd.c | 132 ++++++++++++++++++++++++++++---
arch/x86/kernel/entry_64.S | 5 ++
arch/x86/kernel/irq.c | 6 ++
arch/x86/kernel/irqinit.c | 4 +
arch/x86/kernel/traps.c | 5 ++
13 files changed, 182 insertions(+), 21 deletions(-)

--
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/