Re: [PATCH v2] KVM: SEV-ES: Don't intercept MSR_IA32_DEBUGCTLMSR for SEV-ES guests

From: Sean Christopherson
Date: Thu May 02 2024 - 19:51:59 EST


On Tue, Apr 16, 2024, Ravi Bangoria wrote:
> Currently, LBR Virtualization is dynamically enabled and disabled for
> a vcpu by intercepting writes to MSR_IA32_DEBUGCTLMSR. This helps by
> avoiding unnecessary save/restore of LBR MSRs when nobody is using it
> in the guest. However, SEV-ES guest mandates LBR Virtualization to be
> _always_ ON[1] and thus this dynamic toggling doesn't work for SEV-ES
> guest, in fact it results into fatal error:
>
> SEV-ES guest on Zen3, kvm-amd.ko loaded with lbrv=1
>
> [guest ~]# wrmsr 0x1d9 0x4
> KVM: entry failed, hardware error 0xffffffff
> EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000
> ...
>
> Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests.

Uh, what? I mean, sure, it works, maybe, I dunno. But there's a _massive_
disconnect between the first paragraph and this statement.

Oh, good gravy, it "works" because SEV already forces LBR virtualization.

svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;

(a) the changelog needs to call that out. (b) KVM needs to disallow SEV-ES if
LBR virtualization is disabled by the admin, i.e. if lbrv=false.

Alternatively, I would be a-ok simply deleting lbrv, e.g. to avoid yet more
printks about why SEV-ES couldn't be enabled.

Hmm, I'd probably be more than ok. Because AMD (thankfully, blessedly) uses CPUID
bits for SVM features, the admin can disable LBRV via clear_cpuid (or whatever it's
called now). And there are hardly any checks on the feature, so it's not like
having a boolean saves anything. AMD is clearly committed to making sure LBRV
works, so the odds of KVM really getting much value out of a module param is low.

And then when you delete lbrv, please add a WARN_ON_ONCE() sanity check in
sev_hardware_setup() (if SEV-ES is supported), because like the DECODEASSISTS
and FLUSHBYASID requirements, it's not super obvious that LBRV is a hard
requirement for SEV-ES (that's an understatment; I'm curious how some decided
that LBR virtualization is where the line go drawn for "yeah, _this_ is mandatory").