Re: [PATCH] x86/microcode: Add an option to reload microcode even if revision is unchanged

From: Thomas Gleixner
Date: Tue Sep 17 2019 - 02:37:45 EST


Raj,

On Mon, 16 Sep 2019, Raj, Ashok wrote:
> On Mon, Sep 16, 2019 at 12:36:11PM +0200, Thomas Gleixner wrote:
> > That said, there is also a distinct lack of information about micro code
> > loading in a safe way in general. We absolutely do not know whether a micro
> > code update affects any instruction which might be in use during the update
> > on a sibling. Right now it's all load and pray and the SDM is not really
> > helpful with that either.
> >
>
> Guilty as charged :-). In general we do not expect microcode updates to
> remove any cpuid bits (Not that it hasn't happened, but it slipped through
> the cracks).
>
> microode updates should be of 3 types.
>
> - Only loadable from BIOS (Only via FIT tables)
> - Suitable for early load (things that take cpuid bits for e.g.)
> - Suitable for late-load. (Where no cpuid bits should change etc).
>
> Today the way we load after a stop_machine() all threads in the system are
> held hostage until all the cores have done the update. The thread sibling
> is also in the rendezvous loop.

I know. See below.

> Do you think we still have that risk with a sibling thread?
> (Assuming future ucodes don't do weird things like what happened in
> that case where a cpuid was removed via an update)

Well, yes. The sibling executes a limited set of instructions in a loop,
but it might be hit by an NMI or MCE which executes even more instructions.

So what happens if the ucode update "fixes" one of the executed
instructions on the fly? Is that guaranteed to be safe? There is nothing
which says so.

A decade ago I experimented with putting the spinning CPUs into MWAIT,
which caused havoc. Did neither have time nor the stomach to dig into that
further, but the ucode update _did_ fix an issue with MWAIT according to
the version history.

That's why I'm worried about instructions being "fixed" which are executed
in parallel on the sibling.

An authorative statement vs. that would be appreciated. Preferrably in form
of an extension of the SDM, but an upfront statement in this thread would
be a good start.

Thanks,

tglx