Re: [PATCH 2/4] x86/mce/amd: Introduce deferred error interrupt handler

From: Aravind Gopalakrishnan
Date: Mon May 04 2015 - 15:07:08 EST


On 5/4/2015 1:46 PM, Borislav Petkov wrote:
For deferred errors, the workaround is a little different as it
applies to only the given family/model right now. If the workaround
needs to be applied for future processors, we can extend the family
check for those right?
Or, you can do the check for all families as we're behind a CPUID bit
anyway. This is why CPUID bits are a good thing :-)

Yep. Ok, Will do that.

If we setup 'm.addr' in amd_threshold_interrupt() and
amd_deferred_error_interrupt() properly, then amd_decode_mce() would
actually have some value in m->addr to report.

I didn't mean to say HW doesn't provide us the information in the addr
and/or the misc registers.
So you can use mce_read_aux(), yeah, you can move it to mce-internal.h


Ok, will do.
Is it ok to grow another patch in a V2 for this instead of fixing it in this patch since it's a real bug?
That should be helpful when someone wants to look up git logs of why this was done..

The addr, misc registers are still valid for threshold, deferred errors.
(Of course, misc is valid only if m->status & MCI_STATUS_MISCV)

My point was, in __log_error(), we can read relevant status and addr MSRs to
be passed to mce_log() as those are the only pieces of information we use in
the decoding chain; and discard the m.misc assignment we do for threshold
errors.
But MCx_MISC is important for thresholding errors, it carries the ErrCnt
and stuff.

So you can pass a parameter to __log_error(..., threshold=true, misc)
and do

if (threshold)
m.misc = misc;

Right?


Yeah, just wanted to keep __log_error() as generic as possible and not special case for threshold.
But ok, since MCx_MISC is needed, I'll work it up as you suggested.

Thanks,
-Aravind.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/