Re: [tip:x86/alternatives] [x86/alternatives] ee8962082a: WARNING:at_arch/x86/kernel/cpu/cpuid-deps.c:#do_clear_cpu_cap

From: Sean Christopherson
Date: Mon May 06 2024 - 11:28:56 EST


On Mon, May 06, 2024, Oliver Sang wrote:
> hi, Boris,
>
> On Sat, May 04, 2024 at 02:48:22PM +0200, Borislav Petkov wrote:
> > On Wed, May 01, 2024 at 12:33:05AM +0200, Borislav Petkov wrote:
> > > On Tue, Apr 30, 2024 at 12:51:02PM -0700, Sean Christopherson wrote:
> > > > But that would just mask the underlying problem, it wouldn't actually fix anything
> > > > other than making the WARN go away. Unless I'm misreading the splat+code, the
> > > > issue isn't that init_ia32_feat_ctl() clears VMX late, it's that the BSP sees
> > > > VMX as fully enabled, but at least one AP sees VMX as disabled.
> > > >
> > > > I don't see how the kernel can expect to function correctly with divergent feature
> > > > support across CPUs, i.e. the WARN is a _good_ thing in this case, because it
> > > > alerts the user that their system is messed up, e.g. has a bad BIOS or something.
> > >
> > > Yes, and yes.
> > >
> > > There are two issues. Clearing feature flags after alternatives have
> > > been applied should not happen, and this particular issue with that box.
> > >
> > > Lemme cook up something in the coming days for the former.
> >
> > Two simple patches as a reply to this.
> >
> > Oliver, can you run them on your box pls?
>
> we confirmed after applying them upon ee8962082a, the WARNING which was reported
> in our original report cannot be reproduced any longer.

I am so confused. With both patches applied, simulating VMX being disabled by
BIOS, which is a _legal_ configuration, yields:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at arch/x86/kernel/cpu/cpuid-deps.c:118 do_clear_cpu_cap+0xf6/0x120
Modules linked in:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.9.0-rc3-28ed6849f6ae-rev/boris-vm #389
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:do_clear_cpu_cap+0xf6/0x120
Call Trace:
<TASK>
init_ia32_feat_ctl+0x133/0x420
init_intel+0x11/0x330
identify_cpu+0x242/0x670
identify_secondary_cpu+0xe/0x40
smp_store_cpu_info+0x48/0x60
start_secondary+0x73/0x120
common_startup_64+0x13b/0x141
</TASK>
---[ end trace 0000000000000000 ]---

That's completely expected (by me at least), because as I said in the original
thread, secondary CPUs are identified after alternatives are applied, and when VMX
is disabled by BIOS, the feature flag will be initially set based on CPUID, and
then cleared by init_ia32_feat_ctl(). I.e. patch 1 is wrong on multiple levels.

And without _either_ patch applied, no WARN occurs, which is again expected (by
me), because init_ia32_feat_ctl() runs _before_ alternative_instructions() on the
boot CPU, i.e. alternatives_patched will be false when do_clear_cpu_cap() is
called by the boot CPU, and the boot_cpu_has(feature) that guards the WARN will
be false when do_clear_cpu_cap() is called by secondary CPUs.

The only way the WARN could have fired without this series is if VMX is enabled
in BIOS on the boot CPU, but disabled by BIOS on one more secondary CPUs. And
_that_ is a bogus setup that (a) the kernel absolutely should WARN about, and
(b) _still_ occurs with one or both patches applied.

So I don't see how this series could possibly have fixed the issue Oliver
encountered, nor do I see any value in moving init_ia32_feat_ctl() into
early_init_intel().