Re: [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update

From: Andy Lutomirski
Date: Mon Aug 29 2022 - 10:23:31 EST


On 8/17/22 08:06, Ashok Raj wrote:
On Wed, Aug 17, 2022 at 04:19:40PM +0200, Borislav Petkov wrote:
On Wed, Aug 17, 2022 at 12:30:49PM +0000, Ashok Raj wrote:
You will find out when system returns after reboot and hopefully wasn't
promoted to a cold-boot which will loose MCE banks.
Not good enough!
I probably misread your question.. are you suggesting we add some WARN when
we initiate late_load? I thought you were asking if the HW must signal
something and OS should log when an MCE happens if MCIP=1


This should issue a warning in dmesg that a potential MCE while update
is running would cause a lockup. That is if we don't disable MCE around
it.

If we decide to disable MCE, it should say shutdown.
Ok, that clarifies it.. "IF we choose to set MCIP=1, we should tell users
that hell can break loose, get under the table" :-)

Meaning deal with the effect of a really rare MCE. Rather than trying to
avoid it. Taking the MCE is more important than finishing the update,
and loosing what the error signaled was trying to convey.
Right now I'm inclined to not do anything and warn of a potential rare
situation.
Encouraging.. So I'll drop that patch from the list next time around.


If I followed all this correctly, I agree. If we set MCIP to force a crash if we get MCE, then we are guaranteed to crash.  If we don't, then we might crash.


An imperfect alternative would be to set a (percpu?) flag that we're doing a ucode update and then detect that flag early in the MCE handler and warn very loudly.  This seems like it will give us the best chance of getting a useful diagnostic.