Re: [PATCH 5/6] x86, mce: handle "action required" errors

From: Chen Gong
Date: Wed Dec 14 2011 - 21:56:46 EST


ä 2011/12/15 5:30, Tony Luck åé:
On Wed, Dec 14, 2011 at 1:28 AM, Chen Gong<gong.chen@xxxxxxxxxxxxxxx> wrote:
- if (kill_it&& tolerant< 3)

+ if (worst != MCE_AR_SEVERITY&& kill_it&& tolerant< 3)
force_sig(SIGBUS, current);


I think here it should add more comments to clarify why not killing *AR*
case.
Such as: "for SRAR errors, such as DCU/IFU error, on affected logical
processors, it is reasonable that RIPV is 0."

I'll look at this - the reason to not kill for AR is that we want to
try to recover
first (e.g. page could be re-read from disk into a different physical page).
In some cases we can recover transparently to the application.

Oh, yes, these reasons are very important why not killing *AR* events. But my
point is in a *AR* supported environment, "kill_it" should not be true like
below:
if (!(m.mcgstatus & MCG_STATUS_RIPV))
kill_it = 1;

the reason is what I said before. But at that time the worst severity hasn't
been determined so we have to wati until it is out.

anyway, it is an interesting coincidence, isn't it? :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/