Re: [PATCH] x86/speculation, KVM: respect user IBPB configuration

From: Jon Kohler
Date: Fri Apr 15 2022 - 11:16:16 EST




> On Apr 15, 2022, at 10:28 AM, Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Mon, Apr 11, 2022, Jon Kohler wrote:
>> On vmx_vcpu_load_vmcs and svm_vcpu_load, respect user IBPB config and only
>> attempt IBPB MSR if either always_ibpb or cond_ibpb and the vcpu thread
>> has TIF_SPEC_IB.
>>
>> A vcpu thread will have TIF_SPEC_IB on qemu-kvm using -sandbox on if
>> kernel cmdline spectre_v2_user=seccomp, which would indicate that the user
>> is looking for a higher security environment and has workloads that need
>> to be secured from each other.
>>
>> Note: The behavior of spectre_v2_user recently changed in 5.16 on
>> commit 2f46993d83ff ("x86: change default to
>> spec_store_bypass_disable=prctl spectre_v2_user=prctl")
>>
>> Prior to that, qemu-kvm with -sandbox on would also have TIF_SPEC_IB
>> if spectre_v2_user=auto.
>>
>> Signed-off-by: Jon Kohler <jon@xxxxxxxxxxx>
>> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
>> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
>> Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
>> Cc: Waiman Long <longman@xxxxxxxxxx>
>> ---
>> arch/x86/include/asm/spec-ctrl.h | 12 ++++++++++++
>> arch/x86/kernel/cpu/bugs.c | 6 ++++--
>> arch/x86/kvm/svm/svm.c | 2 +-
>> arch/x86/kvm/vmx/vmx.c | 2 +-
>> 4 files changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/spec-ctrl.h b/arch/x86/include/asm/spec-ctrl.h
>> index 5393babc0598..552757847d5b 100644
>> --- a/arch/x86/include/asm/spec-ctrl.h
>> +++ b/arch/x86/include/asm/spec-ctrl.h
>> @@ -85,4 +85,16 @@ static inline void speculative_store_bypass_ht_init(void) { }
>> extern void speculation_ctrl_update(unsigned long tif);
>> extern void speculation_ctrl_update_current(void);
>>
>> +/*
>> + * Always issue IBPB if switch_mm_always_ibpb and respect conditional
>> + * IBPB if this thread does not have !TIF_SPEC_IB.
>> + */
>> +static inline void maybe_indirect_branch_prediction_barrier(void)
>
> I think it makes sense to give this a virtualization specific name, e.g.
> x86_virt_cond_indirect_branch_prediction_barrier() or x86_virt_cond_ibpb(),
> to follow x86_virt_spec_ctrl(). Or if "cond" is misleading in the "always" case,
> perhaps x86_virt_guest_switch_ibpb()?

+Bijan
Yea, good suggestion. Bijan and I were going back and forth on this in our
internal review and couldn’t land on a good name. I like this naming better.

>
>> +{
>> + if (static_key_enabled(&switch_mm_always_ibpb) ||
>> + (static_key_enabled(&switch_mm_cond_ibpb) &&
>> + test_thread_flag(TIF_SPEC_IB)))
>
> The cond_ibpb case in particular needs a more detailed comment. Specifically it
> should call out why this path doesn't do IBPB when switching from a task with
> TIF_SPEC_IB to a task without TIF_SPEC_IB, whereas cond_mitigation() does emit
> IBPB when switching mms in this case.
>
> But stepping back, why does KVM do its own IBPB in the first place? The goal is
> to prevent one vCPU from attacking the next vCPU run on the same pCPU. But unless
> userspace is running multiple VMs in the same process/mm_struct, switching vCPUs,
> i.e. switching tasks, will also switch mm_structs and thus do IPBP via cond_mitigation.

Good question, I couldn’t figure out the answer to this by walking the code and looking
at git history/blame for this area. Are there VMMs that even run multiple VMs within
the same process? The only case I could think of is a nested situation?

>
> If userspace runs multiple VMs in the same process, enables cond_ipbp, _and_ sets
> TIF_SPEC_IB, then it's being stupid and isn't getting full protection in any case,
> e.g. if userspace is handling an exit-to-userspace condition for two vCPUs from
> different VMs, then the kernel could switch between those two vCPUs' tasks without
> bouncing through KVM and thus without doing KVM's IBPB.

Exactly, so meaning that the only time this would make sense is for some sort of nested
situation or some other funky VMM tomfoolery, but that nested hypervisor might not be
KVM, so it's a farce, yea? Meaning that even in that case, there is zero guarantee
from the host kernel perspective that barriers within that process are being issued on
switch, which would make this security posture just window dressing?

>
> I can kinda see doing this for always_ibpb, e.g. if userspace is unaware of spectre
> and is naively running multiple VMs in the same process.

Agreed. I’ve thought of always_ibpb as "paranoid mode" and if a user signs up for that,
they rarely care about the fast path / performance implications, even if some of the
security surface area is just complete window dressing :(

Looking forward, what if we simplified this to have KVM issue barriers IFF always_ibpb?

And drop the cond’s, since the switching mm_structs should take care of that?

The nice part is that then the cond_mitigation() path handles the going to thread
with flag or going from a thread with flag situation gracefully, and we don’t need to
try to duplicate that smarts in kvm code or somewhere else.

>
> What am I missing?