Re: [PATCH v3 8/9] ACPI, APEI, CPER: Cleanup CPER memory erroroutput format

From: Borislav Petkov
Date: Tue Oct 22 2013 - 04:42:24 EST


On Mon, Oct 21, 2013 at 05:14:05PM +0000, Luck, Tony wrote:
> Even if we recovered from a UC error (which is by no means a sure
> thing) ... I don't think the "requires no further action" message
> applies.
>
> Soft single bit errors are common (well, common-ish ... they should
> still be somewhat rare by most objective standard). Double bit errors
> are much rarer ... and are very unlikely to be the result of two
> single bit errors happening to be inside the same cache line. I'd
> recommend further investigation of the source of a UC error (even one
> that is "recovered" in software).

Btw, do we even need to make this distinction? I mean, do we even reach
this path on an error where we need to raise a #MC exception? In the
initial design we were called from machine_check_poll which is not the
exception path and now we're on the decode_chain which gets all errors.

Are we ready to handle all? And also, why do we even need to
differentiate the error types on reporting? I mean, if it is, say, a
contained UC error and we can start a recovery action from userspace
like killing the process, we probably want to have that same detailed
report too?

[ This is purely hypothetical, of course, as we do the poisoning game
and killing of processes from kernel space now but still... ]

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/