Re: [PATCH v3] x86/speculation, KVM: only IBPB for switch_mm_always_ibpb on vCPU load

From: Sean Christopherson
Date: Fri Apr 29 2022 - 19:23:42 EST


On Sat, Apr 30, 2022, Borislav Petkov wrote:
> On Fri, Apr 29, 2022 at 09:59:52PM +0000, Sean Christopherson wrote:
> > Correct, but KVM also doesn't do IBPB on VM-Exit (or VM-Entry),
>
> Why doesn't it do that? Not needed?

The host kernel is protected via RETPOLINE and by flushing the RSB immediately
after VM-Exit.

> > nor does KVM do IBPB before exiting to userspace.
>
> Same question.

I don't know definitively. My guess is that IBPB is far too costly to do on every
exit, and so the onus was put on userspace to recompile with RETPOLINE. What I
don't know is why it wasn't implemented as an opt-out feature.

> > The IBPB we want to whack is issued only when KVM is switching vCPUs.
>
> Then please document it properly as I've already requested.

I'll write up the bits I have my head wrapped around.

> > Except that _none_ of that documentation explains why the hell KVM
> > does IBPB when switching betwen vCPUs.
>
> Probably because the folks involved in those patches weren't the hell
> mainly virt people. Although I see a bunch of virt people on CC on that
> patch.
>
> > : But stepping back, why does KVM do its own IBPB in the first place?  The goal is
> > : to prevent one vCPU from attacking the next vCPU run on the same pCPU.  But unless
> > : userspace is running multiple VMs in the same process/mm_struct, switching vCPUs,
> > : i.e. switching tasks, will also switch mm_structs and thus do IPBP via cond_mitigation.
> > :
> > : If userspace runs multiple VMs in the same process,
>
> This keeps popping up. Who does that? Can I get a real-life example to
> such VM-based containers or what the hell that is, pls?

I don't know of any actual examples. But, it's trivially easy to create multiple
VMs in a single process, and so proving the negative that no one runs multiple VMs
in a single address space is essentially impossible.

The container thing is just one scenario I can think of where userspace might
actually benefit from sharing an address space, e.g. it would allow backing the
image for large number of VMs with a single set of read-only VMAs.

> > enables cond_ipbp, _and_ sets
> > : TIF_SPEC_IB, then it's being stupid and isn't getting full protection in any case,
> > : e.g. if userspace is handling an exit-to-userspace condition for two vCPUs from
> > : different VMs, then the kernel could switch between those two vCPUs' tasks without
> > : bouncing through KVM and thus without doing KVM's IBPB.
> > :
> > : I can kinda see doing this for always_ibpb, e.g. if userspace is unaware of spectre
> > : and is naively running multiple VMs in the same process.
>
> So this needs a clearer definition: what protection are we even talking
> about when the address spaces of processes are shared? My naïve
> thinking would be: none. They're sharing address space - branch pred.
> poisoning between the two is the least of their worries.

I truly have no idea, which is part of the reason I brought it up in the first
place. I'd have happily just whacked KVM's IBPB entirely, but it seemed prudent
to preserve the existing behavior if someone went out of their way to enable
switch_mm_always_ibpb.

> So to cut to the chase: it sounds to me like you don't want to do IBPB
> at all on vCPU switch.

Yes, or do it iff switch_mm_always_ibpb is enabled to maintain "compability".

> And the process switch case is taken care of by switch_mm().

Yep.