Re: Q. about KVM and CPU hotplug

From: Paolo Bonzini
Date: Tue Nov 30 2021 - 04:28:53 EST


On 11/30/21 09:27, Tian, Kevin wrote:
r = kvm_arch_hardware_enable();

if (r) {
cpumask_clear_cpu(cpu, cpus_hardware_enabled);
atomic_inc(&hardware_enable_failed);
pr_info("kvm: enabling virtualization on CPU%d failed\n", cpu);
}
}

Upon error hardware_enable_failed is incremented. However this variable
is checked only in hardware_enable_all() called when the 1st VM is called.

This implies that KVM may be left in a state where it doesn't know a CPU
not ready to host VMX operations.

Then I'm curious what will happen if a vCPU is scheduled to this CPU. Does
KVM indirectly catch it (e.g. vmenter fail) and return a deterministic error
to Qemu at some point or may it lead to undefined behavior? And is there
any method to prevent vCPU thread from being scheduled to the CPU?

It should fail the first vmptrld instruction. It will result in a few WARN_ONCE and pr_warn_ratelimited (see vmx_insn_failed). For VMX this should be a pretty bad firmware bug, and it has never been reported. KVM did find some undocumented errata but not this one!

I don't think there's any fix other than pinning userspace. The WARNs can be eliminated by calling KVM_BUG_ON in the sched_in notifier, plus checking if the VM is bugged before entering the guest or doing a VMREAD/VMWRITE (usually the check is done only in a ioctl). But some refactoring is probably needed to make the code more robust in general.

Paolo

By design the current generation of TDX doesn't support CPU hotplug. Only boot-time CPUs can be initialized for TDX (and must be done en masse in one breath). Attempting to do seamcalls on a hotplugged CPU
simply fails, thus it potentially affects any trusted domain in case its
vCPUs are scheduled to the plugged CPU.