Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

From: Borislav Petkov
Date: Sun May 03 2015 - 05:22:46 EST


On Thu, Apr 30, 2015 at 09:49:23AM -0500, Aravind Gopalakrishnan wrote:
> Changes introduced in the patch-
> - Assign vector number 0xf4 for Deferred errors
> - Declare deferred_interrupt, allocate gate and bind it
> to DEFERRED_APIC_VECTOR.
> - Declare smp_deferred_interrupt to be used as the
> entry point for the interrupt in mce_amd.c
> - Define trace_deferred_interrupt for tracing
> - Enable deferred error interrupt selectively upon detection
> of 'succor' bitfield
> - Setup amd_deferred_error_interrupt() to handle the interrupt
> and assign it to def_int_vector if feature is present in HW.
> Else, let default handler deal with it.
> - Provide Deferred error interrupt stats on
> /proc/interrupts by incrementing irq_deferred_count

This commit message should explain the feature in more high-level way,
what is it good for and so on, not what you're adding.

That I can see. :-)

> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@xxxxxxx>
> ---
> arch/x86/include/asm/entry_arch.h | 3 +
> arch/x86/include/asm/hardirq.h | 3 +
> arch/x86/include/asm/hw_irq.h | 2 +
> arch/x86/include/asm/irq_vectors.h | 1 +
> arch/x86/include/asm/mce.h | 3 +
> arch/x86/include/asm/trace/irq_vectors.h | 6 ++
> arch/x86/include/asm/traps.h | 3 +-
> arch/x86/kernel/cpu/mcheck/mce_amd.c | 101 +++++++++++++++++++++++++++++++
> arch/x86/kernel/entry_64.S | 5 ++
> arch/x86/kernel/irq.c | 6 ++
> arch/x86/kernel/irqinit.c | 4 ++
> arch/x86/kernel/traps.c | 4 ++
> 12 files changed, 140 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h
> index dc5fa66..f7b957b 100644
> --- a/arch/x86/include/asm/entry_arch.h
> +++ b/arch/x86/include/asm/entry_arch.h
> @@ -50,4 +50,7 @@ BUILD_INTERRUPT(thermal_interrupt,THERMAL_APIC_VECTOR)
> BUILD_INTERRUPT(threshold_interrupt,THRESHOLD_APIC_VECTOR)
> #endif
>
> +#ifdef CONFIG_X86_MCE_AMD
> +BUILD_INTERRUPT(deferred_interrupt, DEFERRED_APIC_VECTOR)

All the other names are written out so you can simply do

BUILD_INTERRUPT(deferred_error_interrupt, DEFERRED_ERROR_VECTOR)

> +#endif
> #endif
> diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h
> index 0f5fb6b..448451c 100644
> --- a/arch/x86/include/asm/hardirq.h
> +++ b/arch/x86/include/asm/hardirq.h
> @@ -33,6 +33,9 @@ typedef struct {
> #ifdef CONFIG_X86_MCE_THRESHOLD
> unsigned int irq_threshold_count;
> #endif
> +#ifdef CONFIG_X86_MCE_AMD
> + unsigned int irq_deferred_count;

Right
unsigned int irq_deferred_error_count;

> +#endif
> #if IS_ENABLED(CONFIG_HYPERV) || defined(CONFIG_XEN)
> unsigned int irq_hv_callback_count;
> #endif
> diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
> index e9571dd..7cf2670 100644
> --- a/arch/x86/include/asm/hw_irq.h
> +++ b/arch/x86/include/asm/hw_irq.h

...

> +static void deferred_error_interrupt_enable(struct cpuinfo_x86 *c)
> +{
> + u32 low = 0, high = 0;
> + int def_offset = -1, def_new;
> +
> + if (rdmsr_safe(MSR_CU_DEF_ERR, &low, &high))
> + return;
> +
> + def_new = (low & MASK_DEF_LVTOFF) >> 4;
> + if (c->x86 == 0x15 && c->x86_model == 0x60 &&
> + !(low & MASK_DEF_LVTOFF)) {

What's the family check for? for BIOSes which don't set the LVT offset
to 2, as they should?

If so, we probably should say

pr_err(FW_BUG "Your BIOS is not setting up LVT offset 0x2 for deferred error IRQs correctly.\n");

or similar...

> + def_new = DEF_LVT_OFF;
> + low = (low & ~MASK_DEF_LVTOFF) | (DEF_LVT_OFF << 4);
> + }
> +
> + def_offset = setup_APIC_deferred_error(def_offset, def_new);
> +
> + if ((def_offset == def_new) &&
> + (def_int_vector != amd_deferred_error_interrupt))
> + def_int_vector = amd_deferred_error_interrupt;
> +
> + low = (low & ~MASK_DEF_INT_TYPE) | DEF_INT_TYPE_APIC;
> + wrmsr(MSR_CU_DEF_ERR, low, high);
> +}

...

> +/* Apic interrupt handler for deferred errors */
> +static void amd_deferred_error_interrupt(void)
> +{
> + u64 status;
> + unsigned int bank;
> + struct mce m;
> +
> + for (bank = 0; bank < mca_cfg.banks; ++bank) {
> + rdmsrl(MSR_IA32_MCx_STATUS(bank), status);
> +
> + if (!(status & MCI_STATUS_VAL) ||
> + !(status & MCI_STATUS_DEFERRED))
> + continue;
> +
> + mce_setup(&m);
> + m.bank = bank;
> + m.status = status;
> + mce_log(&m);
> + wrmsrl(MSR_IA32_MCx_STATUS(bank), 0);
> + break;
> + }

That's very similar to what we do in the end of
amd_threshold_interrupt(). You could add a generic __log_error() static
helper in a pre-patch and then call it here.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/