Re: [PATCH v2] KVM: SEV-ES: Don't intercept MSR_IA32_DEBUGCTLMSR for SEV-ES guests

From: Ravi Bangoria
Date: Mon May 06 2024 - 00:49:33 EST


On 03-May-24 5:21 AM, Sean Christopherson wrote:
> On Tue, Apr 16, 2024, Ravi Bangoria wrote:
>> Currently, LBR Virtualization is dynamically enabled and disabled for
>> a vcpu by intercepting writes to MSR_IA32_DEBUGCTLMSR. This helps by
>> avoiding unnecessary save/restore of LBR MSRs when nobody is using it
>> in the guest. However, SEV-ES guest mandates LBR Virtualization to be
>> _always_ ON[1] and thus this dynamic toggling doesn't work for SEV-ES
>> guest, in fact it results into fatal error:
>>
>> SEV-ES guest on Zen3, kvm-amd.ko loaded with lbrv=1
>>
>> [guest ~]# wrmsr 0x1d9 0x4
>> KVM: entry failed, hardware error 0xffffffff
>> EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000
>> ...
>>
>> Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests.
>
> Uh, what? I mean, sure, it works, maybe, I dunno. But there's a _massive_
> disconnect between the first paragraph and this statement.
>
> Oh, good gravy, it "works" because SEV already forces LBR virtualization.
>
> svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
>
> (a) the changelog needs to call that out.

Sorry, I should have called that out explicitly.

> (b) KVM needs to disallow SEV-ES if
> LBR virtualization is disabled by the admin, i.e. if lbrv=false.

That's what I initially thought. But since KVM currently allows booting SEV-ES
guests even when lbrv=0 (by silently ignoring lbrv value), erroring out would
be a behavior change.

> Alternatively, I would be a-ok simply deleting lbrv, e.g. to avoid yet more
> printks about why SEV-ES couldn't be enabled.
>
> Hmm, I'd probably be more than ok. Because AMD (thankfully, blessedly) uses CPUID
> bits for SVM features, the admin can disable LBRV via clear_cpuid (or whatever it's
> called now). And there are hardly any checks on the feature, so it's not like
> having a boolean saves anything. AMD is clearly committed to making sure LBRV
> works, so the odds of KVM really getting much value out of a module param is low.

Currently, lbrv is not enabled by default with model specific -cpu profiles in
qemu. So I guess this is not backward compatible?

> And then when you delete lbrv, please add a WARN_ON_ONCE() sanity check in
> sev_hardware_setup() (if SEV-ES is supported), because like the DECODEASSISTS
> and FLUSHBYASID requirements, it's not super obvious that LBRV is a hard
> requirement for SEV-ES (that's an understatment; I'm curious how some decided
> that LBR virtualization is where the line go drawn for "yeah, _this_ is mandatory").

I'm not sure. Some ES internal dependency.

In any case, the patch simply fixes 'missed clearing MSR Interception' for
SEV-ES guests. So, would it be okay to apply this patch as is and do lbrv
cleanup as a followup series?

PS: AMD Bus Lock Detect virtualization also dependents on LBR Virtualization:
https://lore.kernel.org/r/20240429060643.211-4-ravi.bangoria@xxxxxxx

Thanks for the review,
Ravi