Re: [PATCH] x86/xen: zero MSR_IA32_SPEC_CTRL before suspend

From: Jan Beulich
Date: Wed Mar 14 2018 - 04:49:05 EST


>>> On 26.02.18 at 15:08, <jgross@xxxxxxxx> wrote:
> @@ -35,6 +40,9 @@ void xen_arch_post_suspend(int cancelled)
>
> static void xen_vcpu_notify_restore(void *data)
> {
> + if (xen_pv_domain() && boot_cpu_has(X86_FEATURE_SPEC_CTRL))
> + wrmsrl(MSR_IA32_SPEC_CTRL, this_cpu_read(spec_ctrl));
> +
> /* Boot processor notified via generic timekeeping_resume() */
> if (smp_processor_id() == 0)
> return;
> @@ -44,7 +52,15 @@ static void xen_vcpu_notify_restore(void *data)
>
> static void xen_vcpu_notify_suspend(void *data)
> {
> + u64 tmp;
> +
> tick_suspend_local();
> +
> + if (xen_pv_domain() && boot_cpu_has(X86_FEATURE_SPEC_CTRL)) {
> + rdmsrl(MSR_IA32_SPEC_CTRL, tmp);
> + this_cpu_write(spec_ctrl, tmp);
> + wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> + }
> }

While investigating ways how to do something similar on our old,
non-pvops kernels I've started wondering if this solution is actually
correct in all cases. Of course discussing this is complicated by the
fact that the change there might be a conflict with hasn't landed
in Linus'es tree yet (see e.g.
https://patchwork.kernel.org/patch/10153843/ for an upstream
submission; I haven't been able to find any discussion on that
patch or why it isn't upstream yet), but we have it in our various
branches. The potential problem I'm seeing is with the clearing
and re-setting of SPEC_CTRL around CPUs going idle. While the
active CPU could have preemption disabled (if that isn't the case
already), the passive CPUs are - afaict - neither under full control
of drivers/xen/manage.c:do_suspend() nor excluded yet from
any further scheduling activity. Hence with code like this (taken
from one of our branches)

static void mwait_idle(void)
{
if (!current_set_polling_and_test()) {
trace_cpu_idle_rcuidle(1, smp_processor_id());
if (this_cpu_has(X86_BUG_CLFLUSH_MONITOR)) {
smp_mb(); /* quirk */
clflush((void *)&current_thread_info()->flags);
smp_mb(); /* quirk */
}

x86_disable_ibrs();

__monitor((void *)&current_thread_info()->flags, 0, 0);
if (!need_resched())
__sti_mwait(0, 0);
else
local_irq_enable();

x86_enable_ibrs();
...

the MSR might get set to non-zero again after having been
cleared by the code your patch adds. I therefore think that the
only race free solution would be to do the clearing from
stop-machine context. But maybe I'm overlooking something.

Jan