Re: [PATCH 2/2] KVM: x86: Forbid KVM_SET_CPUID{,2} after KVM_RUN

From: Vitaly Kuznetsov
Date: Mon Jan 17 2022 - 04:56:03 EST


Sean Christopherson <seanjc@xxxxxxxxxx> writes:

> On Fri, Jan 14, 2022, Vitaly Kuznetsov wrote:
>> Igor Mammedov <imammedo@xxxxxxxxxx> writes:
>>
>> > On Fri, 14 Jan 2022 10:31:50 +0100
>> > Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote:
>> >
>> >> Igor Mammedov <imammedo@xxxxxxxxxx> writes:
>> >>
>> >>
>> >> > However, a problem of failing KVM_SET_CPUID2 during VCPU re-plug
>> >> > is still there and re-plug will fail if KVM rejects repeated KVM_SET_CPUID2
>> >> > even if ioctl called with exactly the same CPUID leafs as the 1st call.
>> >> >
>> >>
>> >> Assuming APIC id change doesn not need to be supported, I can send v2
>> >> here with an empty allowlist.
>> > As you mentioned in another thread black list would be better
>> > to address Sean's concerns or just revert problematic commit.
>> >
>>
>> Personally, I'm leaning towards the blocklist approach even if just for
>> 'documenting' the fact that KVM doesn't correctly handle the
>> change. Compared to a comment in the code, such approach could help
>> someone save tons of debugging time (if anyone ever decides do something
>> weird, like changing MAXPHYADDR on the fly).
>
> I assume the blocklist approach is let userspace opt into rejecting KVM_SET_CPUID{,2},
> but allow all CPUID leafs and sub-leafs to be modified at will by
> default?

No, honestly I was thinking about something much simpler: instead of
forbidding KVM_SET_CPUID{,2} after KVM_RUN completely (what we have now
in 5.16), we only forbid to change certain data which we know breaks
some assumptions in MMU, from the comment:
"
* KVM does not correctly handle changing guest CPUID after KVM_RUN, as
* MAXPHYADDR, GBPAGES support, AMD reserved bit behavior, etc.. aren't
* tracked in kvm_mmu_page_role. As a result, KVM may miss guest page
* faults due to reusing SPs/SPTEs.
"
It seems that CPU hotplug path doesn't need to change these so we don't
need an opt-in/opt-out, we can just forbid changing certain things for
the time being. Alternatively, we can silently ignore such changes but I
don't quite like it because it would mask bugs in VMMs.

> I don't dislike the idea, but I wonder if it's unnecessarily fancy.
>
> What if we instead provide an ioctl/capability to let userspace toggle disabling
> of KVM_SET_CPUID{,2}, a la STAC/CLAC to override SMAP? E.g. QEMU could enable
> protections after initially creating the vCPU, then temporarily
> disable protections only for the hotplug path?
>
> That'd provide solid protections for minimal effort, and if userspace can restrict
> the danger zone to one specific path, then userspace can easily do its own auditing
> for that one path.

Could work but it seems the protection would only "protect" VMM from
shooting itself in the foot and will likely result in killing the guest
anyway so I'm wondering if it's worth it.

--
Vitaly