Re: [PATCH v4 2/4] x86/bus_lock: Handle warn and fatal in #DB for bus lock

From: Thomas Gleixner
Date: Wed Jan 27 2021 - 16:17:02 EST


On Tue, Nov 24 2020 at 20:52, Fenghua Yu wrote:

> #DB for bus lock is enabled by bus lock detection bit 2 in DEBUGCTL MSR
> while #AC for split lock is enabled by split lock detection bit 29 in
> TEST_CTRL MSR.
>
> Delivery of #DB for bus lock in userspace clears DR6[11]. To avoid
> confusion in identifying #DB, #DB handler sets the bit to 1 before
> returning to the interrupted task.
>
> Use the existing kernel command line option "split_lock_detect=" to handle
> #DB for bus lock:
>
> split_lock_detect=
> #AC for split lock #DB for bus lock
>
> off Do nothing Do nothing
>
> warn Kernel OOPs Warn once per task and
> Warn once per task and and continues to run.
> disable future checking When both features are
> supported, warn in #DB

Which means that we don't catch kernel split locks anymore with 'warn'
if bus lock detection is supported. WHY? There is zero rationale for
this change in the changelog.

> fatal Kernel OOPs Send SIGBUS to user
> Send SIGBUS to user When both features are
> supported, split lock
> triggers #AC and bus lock
> from non-WB triggers #DB.


> /*
> - * Default to sld_off because most systems do not support split lock detection
> - * split_lock_setup() will switch this to sld_warn on systems that support
> - * split lock detect, unless there is a command line override.
> + * Default to sld_off because most systems do not support split lock detection.
> + * sld_state_setup() will switch this to sld_warn on systems that support
> + * split lock/bus lock detect, unless there is a command line override.
> */
> static enum split_lock_detect_state sld_state __ro_after_init = sld_off;
> static u64 msr_test_ctrl_cache __ro_after_init;
> +/* Split lock detection is enabled if it's true. */
> +static bool sld;

Why did you bother with 3 letters? bool s, b; along with comments
explaining what it means would have been sufficient, right?

sld_enable/bld_enable would be too self explaining and this also lacks
__ro_after_init

Aside of that it's beyond silly because bld and sld are just shadowing
the corresponding CPU feature bits. So what are these variables gaining
aside of confusion?

> +/* Bus lock detection is enabled if it's true. */
> +static bool bld;
>
> +static void __init sld_state_setup(void)

This is confusing as hell. sld_state_setup() is used for bus lock as
well and split_lock_detect_state is not less confusing. It took me five
reads to figure out how all of that works.

> +static void __init _split_lock_setup(void)

We generally use two underscores for readability sake.

> +{
> + if (!split_lock_verify_msr(false)) {
> + pr_info("MSR access failed: Disabled\n");

> /*
> @@ -1079,6 +1084,15 @@ static void sld_update_msr(bool on)
>
> static void split_lock_init(void)
> {
> + /*
> + * If supported, #DB for bus lock will handle warn
> + * and #AC for split lock is disabled.

Why does this disable the kernel detection? Just because?

> +void handle_bus_lock(struct pt_regs *regs)
> +{
> + if (!bld)
> + return;

How is #DB ever calling this function when the debug MSR bit is not set?

> -void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
> +static void __init split_lock_setup(struct cpuinfo_x86 *c)
> {
> const struct x86_cpu_id *m;
> u64 ia32_core_caps;
> @@ -1189,5 +1237,43 @@ void __init cpu_set_core_cap_bits(struct cpuinfo_x86 *c)
> }
>
> cpu_model_supports_sld = true;
> - split_lock_setup();
> + _split_lock_setup();
> +}
> +
> +static void sld_state_show(void)
> +{
> + if (!bld && !sld)
> + return;
> +
> + switch (sld_state) {
> + case sld_off:
> + pr_info("disabled\n");
> + break;
> + case sld_warn:
> + if (bld)
> + pr_info("#DB: warning about user-space bus_locks\n");
> + else
> + pr_info("#AC: crashing the kernel about kernel split_locks and warning about user-space split_locks\n");

crashing about?

> + break;
> + case sld_fatal:
> + if (sld)
> + pr_info("#AC: crashing the kernel on kernel split_locks and sending SIGBUS on user-space split_locks\n");
> + if (bld)
> + pr_info("#DB: sending SIGBUS on user-space bus_locks%s\n", sld ? " from non-WB" : "");
> + break;
> + }

Thanks,

tglx