Re: [Patch v3 03/11] driver/edac/mpc85xx_edac: Drop setting/clearing RFXE bit in HID1

From: Borislav Petkov
Date: Mon Aug 08 2016 - 03:11:38 EST


On Thu, Aug 04, 2016 at 03:58:28PM -0700, York Sun wrote:
> On e500v1, read fault exception enable (RFXE) controls whether
> assertion of core_fault_in causes a machine check interrupt.
> Assertion of core_fault_in can result from uncorrectable data
> error, such as an L2 multibit ECC error. It can also occur from
> a system error if logic on the integrated device signals a fault
> for nonfatal errors. RFXE bit is cleared out of reset, and should
> be left clear for normal operation. Assertion of core_fault_in does
> not cause a machine check.
>
> RFXE is set specifically for RIO (Rapid IO) and PCI for book E to
> catch the errors by machine check. With this bit set, EDAC driver
> can't get the interrupt in case of uncorrectable error. So this
> bit is cleared in favor of EDAC. However, the benefit of catching
> such uncorrectable error doesn't outweight the other errors which
> may hang the system. Beside, e500v2 has different errors maksed
> by RFXE, and e500mc doesn't support this bit. It is more reasonable
> to leave RFXE as is in EDAC driver, and leave the uncorrectable
> errors triggering machine check for e500v1.

Very nice, thanks for expanding it!

Two final remarks:

- please use a spell checker

- now, what happens if you leave RFXE clear and mpc85xx_edac gets the
error? Is it going to do proper error handling of the uncorrectable
error or are we better off handling the error in the #MC interrupt
handler?

IOW, is mpc85xx_edac well equipped to handle those multibit errors or
should we leave the current setting as is?

Thanks.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--