Re: [PATCH v2 11/11] KVM: x86: emulator/smm: preserve interrupt shadow in SMRAM

From: Maxim Levitsky
Date: Wed Jul 06 2022 - 16:00:23 EST


On Wed, 2022-07-06 at 11:13 -0700, Jim Mattson wrote:
> On Tue, Jul 5, 2022 at 6:38 AM Maxim Levitsky <mlevitsk@xxxxxxxxxx> wrote:
>
> > Most of the SMI save state area is reserved, and the handler has no way of knowing
> > what CPU stored there, it can only access the fields that are reserved in the spec.
> >
> > Yes, if the SMI handler really insists it can see that the saved RIP points to an
> > instruction that follows the STI, but does that really matter? It is allowed by the
> > spec explicitly anyway.
>
> I was just pointing out that the difference between blocking SMI and
> not blocking SMI is, in fact, observable.

Yes, and I agree, I should have said that while observable,
it should cause no problem.


>
> > Plus our SMI layout (at least for 32 bit) doesn't confirm to the X86 spec anyway,
> > we as I found out flat out write over the fields that have other meaning in the X86 spec.
>
> Shouldn't we fix that?
I am afraid we can't because that will break (in theory) the backward compatibility
(e.g if someone migrates a VM while in SMM).

Plus this is only for 32 bit layout which is only used when the guest has no long
mode in CPUID, which is only used these days by 32 bit qemu

(I found it the hard way when I found that SMM with a nested guest doesn't work
for me on 32 bit, and it was because the KVM doesn't bother to save/restore the
running nested guest vmcb address, when we use 32 bit SMM layout, which makes
sense because truly 32 bit only AMD cpus likely didn't had SVM).

But then after looking at SDM I also found out that Intel and AMD have completely
different SMM layout for 64 bit. We follow the AMD's layout, but we don't
implement many fields, including some that are barely/not documented.
(e.g what is svm_guest_virtual_int?)

In theory we could use Intel's layout when we run with Intel's vendor ID,
and AMD's vise versa, but we probably won't bother + once again there
is an issue of backward compatibility.

Feel free to look at the patch series, I documented fully the SMRAM layout
that KVM uses, including all the places when it differs from the real
thing.


>
> > Also I proposed to preserve the int shadow in internal kvm state and migrate
> > it in upper 4 bits of the 'shadow' field of struct kvm_vcpu_events.
> > Both Paolo and Sean proposed to store the int shadow in the SMRAM instead,
> > and you didn't object to this, and now after I refactored and implemented
> > the whole thing you suddently do.
>
> I did not see the prior conversations. I rarely get an opportunity to
> read the list.
I understand.

>
> > However AMD just recently posted a VNMI patch series to avoid
> > single stepping the CPU when NMI is blocked due to the same reason, because
> > it is fragile.
>
> The vNMI feature isn't available in any shipping processor yet, is it?
Yes, but one of its purposes is to avoid single stepping the guest,
which is especially painful on AMD, because there is no MTF, so
you have to 'borrow' the TF flag in the EFLAGS, and that can leak into
the guest state (e.g pushed onto the stack).


>
> > Do you really want KVM to single step the guest in this case, to deliver the #SMI?
> > I can do it, but it is bound to cause lot of trouble.
>
> Perhaps you could document this as a KVM erratum...one of many
> involving virtual SMI delivery.

Absolutely, I can document that we choose to save/restore the int shadow in
SMRAM, something that CPUs usually don't really do, but happens to be the best way
to deal with this corner case.

(Actually looking at clause of default treatment of SMIs in Intel's PRM,
they do mention that they preserve the int shadow somewhere at least
on some Intel's CPUs).


BTW, according to my observations, it is really hard to hit this problem,
because it looks like when the CPU is in interrupt shadow, it doesn't process
_real_ interrupts as well (despite the fact that in VM, real interrupts
should never be blocked(*), but yet, that is what I observed on both AMD and Intel.

(*) You can allow the guest to control the real EFLAGS.IF on both VMX and SVM,
(in which case int shadow should indeed work as on bare metal)
but KVM of course doesn't do it.

I observed that when KVM sends #SMI from other vCPU, it sends a vCPU kick,
and the kick never arrives inside the interrupt shadow.
I have seen it on both VMX and SVM.

What still triggers this problem, is that the instruction which is in the interrupt
shadow can still get a VM exit, (e.g EPT/NPT violation) and then it can notice
the pending SMI.

I think it has to be EPT/NPT violation btw, because, IMHO most if not all other VM exits I
think are instruction intercepts, which will cause KVM to emulate the instruction
and clear the interrupt shadow, and only after that it will enter SMM.

Even MMIO/IOPORT access is emulated by the KVM.

Its not the case with EPT/NPT violation, because the KVM will in this case re-execute
the instruction after it 'fixes' the fault.

Best regards,
Maxim Levitsky



>