Re: [PATCH v3 2/8] x86/split_lock: Ensure X86_FEATURE_SPLIT_LOCK_DETECT means the existence of feature

From: Xiaoyao Li
Date: Thu Mar 05 2020 - 21:15:33 EST


On 3/6/2020 12:23 AM, Sean Christopherson wrote:
On Wed, Mar 04, 2020 at 09:49:14AM +0800, Xiaoyao Li wrote:
On 3/4/2020 3:41 AM, Sean Christopherson wrote:
On Tue, Mar 03, 2020 at 10:55:24AM -0800, Sean Christopherson wrote:
On Thu, Feb 06, 2020 at 03:04:06PM +0800, Xiaoyao Li wrote:
When flag X86_FEATURE_SPLIT_LOCK_DETECT is set, it should ensure the
existence of MSR_TEST_CTRL and MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit.

The changelog confused me a bit. "When flag X86_FEATURE_SPLIT_LOCK_DETECT
is set" makes it sound like the logic is being applied after the feature
bit is set. Maybe something like:

```
Verify MSR_TEST_CTRL.SPLIT_LOCK_DETECT can be toggled via WRMSR prior to
setting the SPLIT_LOCK_DETECT feature bit so that runtime consumers,
e.g. KVM, don't need to worry about WRMSR failure.
```

Signed-off-by: Xiaoyao Li <xiaoyao.li@xxxxxxxxx>
---
arch/x86/kernel/cpu/intel.c | 41 +++++++++++++++++++++----------------
1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 2b3874a96bd4..49535ed81c22 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -702,7 +702,8 @@ static void init_intel(struct cpuinfo_x86 *c)
if (tsx_ctrl_state == TSX_CTRL_DISABLE)
tsx_disable();
- split_lock_init();
+ if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
+ split_lock_init();
}
#ifdef CONFIG_X86_32
@@ -986,9 +987,26 @@ static inline bool match_option(const char *arg, int arglen, const char *opt)
static void __init split_lock_setup(void)
{
+ u64 test_ctrl_val;
char arg[20];
int i, ret;
+ /*
+ * Use the "safe" versions of rdmsr/wrmsr here to ensure MSR_TEST_CTRL
+ * and MSR_TEST_CTRL.SPLIT_LOCK_DETECT bit do exist. Because there may
+ * be glitches in virtualization that leave a guest with an incorrect
+ * view of real h/w capabilities.
+ */
+ if (rdmsrl_safe(MSR_TEST_CTRL, &test_ctrl_val))
+ return;
+
+ if (wrmsrl_safe(MSR_TEST_CTRL,
+ test_ctrl_val | MSR_TEST_CTRL_SPLIT_LOCK_DETECT))
+ return;
+
+ if (wrmsrl_safe(MSR_TEST_CTRL, test_ctrl_val))
+ return;a

Probing the MSR should be skipped if SLD is disabled in sld_options, i.e.
move this code (and setup_force_cpu_cap() etc...) down below the
match_option() logic. The above would temporarily enable SLD even if the
admin has explicitly disabled it, e.g. makes the kernel param useless for
turning off the feature due to bugs.

Hmm, but this prevents KVM from exposing SLD to a guest when it's off in
the kernel, which would be a useful debug/testing scenario.

Maybe add another SLD state to forcefully disable SLD? That way the admin
can turn of SLD in the host kernel but still allow KVM to expose it to its
guests. E.g.

I don't think we need do this.

IMO, this a the bug of split_lock_init(), which assume the initial value of
MSR_TEST_CTRL is zero, at least bit SPLIT_LOCK of which is zero.
This is problem, it's possible that BIOS has set this bit.

Hmm, yeah, that's a bug. But it's a separate bug.
split_lock_setup() here, is to check if the feature really exists. So
probing MSR_TEST_CTRL and bit MSR_TEST_CTRL_SPLIT_LOCK_DETECT here. If there
all exist, setup_force_cpu_cap(X86_FEATURE_SPLIT_LOCK_DETECT) to indicate
feature does exist.
Only when feature exists, there is a need to parse the command line config
of split_lock_detect.

Toggling SPLIT_LOCK before checking the kernel param is bad behavior, e.g.
if someone has broken silicon that causes explosions if SPLIT_LOCK=1. The
behavior is especially bad because cpu_set_core_cap_bits() enumerates split
lock detection using FMS, i.e. clearcpuid to kill CORE_CAPABILITIES
wouldn't work either.


It makes things complicated when we take all into account.

We check kernel param first in BSP, if it's sld_off, we don't set flag X86_FEATURE_SPLIT_LOCK_DETECT. Of course during APs booting, there is no X86_FEATURE_SPLIT_LOCK_DETECT, it won't do split_lock_init().

However, due to X86_FEATURE_SPLIT_LOCK_DETECT flag not being set, clearing SLD bit in each AP when sld_off in case BIOS has set it, won't work. So in split_lock_setup() here, if sld_off, we don't set flag X86_FEATURE_SPLIT_LOCK_DETECT, and we also need to send IPI to each AP to clear SLD bit ?