Re: [PATCH v2 1/3] KVM: x86: Deflect unknown MSR accesses to user space

From: Alexander Graf
Date: Thu Jul 30 2020 - 19:08:08 EST




On 31.07.20 00:42, Jim Mattson wrote:

On Wed, Jul 29, 2020 at 4:59 PM Alexander Graf <graf@xxxxxxxxxx> wrote:

MSRs are weird. Some of them are normal control registers, such as EFER.
Some however are registers that really are model specific, not very
interesting to virtualization workloads, and not performance critical.
Others again are really just windows into package configuration.

Out of these MSRs, only the first category is necessary to implement in
kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against
certain CPU models and MSRs that contain information on the package level
are much better suited for user space to process. However, over time we have
accumulated a lot of MSRs that are not the first category, but still handled
by in-kernel KVM code.

This patch adds a generic interface to handle WRMSR and RDMSR from user
space. With this, any future MSR that is part of the latter categories can
be handled in user space.

Furthermore, it allows us to replace the existing "ignore_msrs" logic with
something that applies per-VM rather than on the full system. That way you
can run productive VMs in parallel to experimental ones where you don't care
about proper MSR handling.

Signed-off-by: Alexander Graf <graf@xxxxxxxxxx>

Can we just drop em_wrmsr and em_rdmsr? The in-kernel emulator is
already incomplete, and I don't think there is ever a good reason for
kvm to emulate RDMSR or WRMSR if the VM-exit was for some other reason
(and we shouldn't end up here if the VM-exit was for RDMSR or WRMSR).
Am I missing something?

On certain combinations of CPUs and guest modes, such as real mode on pre-Nehalem(?) at least, we are running all guest code through the emulator and thus may encounter a RDMSR or WRMSR instruction. I *think* we also do so for big real mode on more modern CPUs, but I'm not 100% sure.

You seem to be assuming that the instruction at CS:IP will still be
RDMSR (or WRMSR) after returning from userspace, and we will come
through kvm_{get,set}_msr_user_space again at the next KVM_RUN. That
isn't necessarily the case, for a variety of reasons. I think the

Do you have a particular situation in mind where that would not be the case and where we would still want to actually complete an MSR operation after the environment changed?

'completion' of the userspace instruction emulation should be done
with the complete_userspace_io [sic] mechanism instead.

Hm, that would avoid a roundtrip into guest mode, but add a cycle through the in-kernel emulator. I'm not sure that's a net win quite yet.


I'd really like to see this mechanism apply only in the case of
invalid/unknown MSRs, and not for illegal reads/writes as well.

Why? Any #GP inducing MSR access will be on the slow path. What's the problem if you get a few more of them in user space that you just bounce back as failing, so they actually do inject a fault?

Alex



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879