Re: [Patch v3 Part2 3/9] x86/microcode/intel: Fix collect_cpu_info() to reflect current microcode

From: Borislav Petkov
Date: Tue Jan 31 2023 - 16:09:04 EST


On Tue, Jan 31, 2023 at 08:49:52PM +0000, Luck, Tony wrote:
> What happens here if the update on the first hyperthread failed (sure, it shouldn't,
> but stuff happens at large scale). In this case the current rev is still older that the
> the cache version ... so there is no "goto out", and this hyperthread will now write
> the MSR to initiate microcode update here, while the first thread is off executing
> arbitrary code (the situation that we want to avoid).

Lemme see if I can follow: we sync all threads in __reload_late() and
once they all arrive, we send them down into ->apply_microcode.

T0 arrives, and fails the update. That is this piece:

/* write microcode via MSR 0x79 */
wrmsrl(MSR_IA32_UCODE_WRITE, (unsigned long)mc->bits);

rev = intel_get_microcode_revision();

if (rev != mc->hdr.rev) {
pr_err("CPU%d update to revision 0x%x failed\n",
cpu, mc->hdr.rev);
return UCODE_ERROR;
}

We return here without updating cpu_sig.rev, as we should.

T1 arrives, updates successfully and updates its cpu_sig.rev.

T0's patch level has been updated too with that because the microcode
engine is shared between the threads. T0's cpu_sig.rev isn't, however,
as that has happened "behind its back", so to speak.

Is that the scenario you're talking about?

If so, if you look at __reload_late(), it'll say

pr_warn("Error reloading microcode on CPU %d\n", cpu);

and the large scale operator will know.

And well, the easy fix is, do the reload again. :-)

That'll update the cached values too.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette