Re: KVM vs AMD: Re: [PATCH v3 48/59] x86/retbleed: Add SKL return thunk

From: Paolo Bonzini
Date: Mon Nov 07 2022 - 04:38:43 EST


On Fri, Nov 4, 2022 at 1:45 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Thu, Nov 03, 2022 at 10:53:54PM +0000, Andrew Cooper wrote:
> > On 21/10/2022 16:21, Nathan Chancellor wrote:
> > > On Fri, Oct 21, 2022 at 11:53:09AM +0200, Peter Zijlstra wrote:
> > >> On Thu, Oct 20, 2022 at 04:10:28PM -0700, Nathan Chancellor wrote:
> > >>> This commit is now in -next as commit 5d8213864ade ("x86/retbleed: Add
> > >>> SKL return thunk"). I just bisected an immediate reboot on my AMD test
> > >>> system when starting a virtual machine with QEMU + KVM to it (see the
> > >>> bisect log below). My Intel test systems do not show this.
> > >>> Unfortunately, I do not have much more information, as there are no logs
> > >>> in journalctl, which makes sense as the reboot occurs immediately after
> > >>> I hit the enter key for the QEMU command.
> > >>>
> > >>> If there is any further information I can provide or patches I can test
> > >>> for further debugging, I am more than happy to do so.
> > >> Moo :-(
> > >>
> > >> you happen to have a .config for me?
> > > Sure thing, sorry I did not provide it in the first place! Attached. It
> > > has been run through localmodconfig for the particular machine but I
> > > assume the core pieces should still be present.
> >
> > Following up from some debugging on IRC.
> >
> > The problem is that FILL_RETURN_BUFFER now has a per-cpu variable
> > access, and AMD SVM has a fun optimisation where the VMRUN instruction
> > doesn't swap, amongst other things, %gs.
> >
> > per-cpu variables only become safe following
> > vmload(__sme_page_pa(sd->save_area)); in svm_vcpu_enter_exit().
> >
> > Given that retbleed=force ought to work on non-skylake hardware, the
> > appropriate fix is to move the VMLOAD/VMSAVE's down into asm and put
> > them adjacent to VMRUN.
> >
> > This also addresses an undocumented dependency where its only the memory
> > clobber in vmload() which stops the compiler moving
> > svm_vcpu_enter_exit()'s calculation of sd into an unsafe position.
>
> So, aside from wasting the entire morning on resuscitating my AMD
> Interlagos, I ended up with the below patch which seems to work.
>
> Not being a virt person, I'm sure I've messed up something, please
> advise.

Oh, that was fast. I was doing similar stuff to move MSR_IA32_SPEC_CTRL
save/restore to assembly, because we're not sure it's safe to do the restore
in C code, and there is overlap with this change. I'll get it out today.

The main issue in the patch below is that _ASM_ARG4 does not exist
on 32-bits, and also _ASM_ARG3 is kinda offlimits because I need it
for the aforementioned MSR_IA32_SPEC_CTRL change.

Otherwise it's similar to my change.

Paolo