Re: [FYI PATCH 0/7] Mitigation for CVE-2018-12207

From: Pawan Gupta
Date: Wed Nov 13 2019 - 18:32:00 EST


On Wed, Nov 13, 2019 at 09:23:30AM +0100, Paolo Bonzini wrote:
> On 13/11/19 07:38, Jan Kiszka wrote:
> > When reading MCE, error code 0150h, ie. SRAR, I was wondering if that
> > couldn't simply be handled by the host. But I suppose the symptom of
> > that erratum is not "just" regular recoverable MCE, rather
> > sometimes/always an unrecoverable CPU state, despite the error code, right?
>
> The erratum documentation talks explicitly about hanging the system, but
> it's not clear if it's just a result of the OS mishandling the MCE, or
> something worse. So I don't know. :( Pawan, do you?

As Dave mentioned in the other email its "something worse".

Although this erratum results in a machine check with the same MCACOD
signature as an SRAR error (0x150) the MCi_STATUS.PCC bit will be set to
one. The Intel Software Developers manual says that PCC=1 errors are
fatal and cannot be recovered.

15.10.4.1 Machine-Check Exception Handler for Error Recovery [1]

[...]
The PCC flag in each IA32_MCi_STATUS register indicates whether recovery
from the error is possible for uncorrected errors (UC=1). If the PCC
flag is set for enabled uncorrected errors (UC=1 and EN=1), recovery is
not possible.

Thanks,
Pawan

[1]
https://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-software-developer-vol-3b-part-2-manual.html