Re: [PATCH v8 003/103] KVM: Refactor CPU compatibility check on module initialization

From: Isaku Yamahata
Date: Tue Aug 23 2022 - 01:27:20 EST


On Fri, Aug 12, 2022 at 11:35:29AM +0000,
"Huang, Kai" <kai.huang@xxxxxxxxx> wrote:

> > 3. Provide arch hooks that are invoked for "power management" operations (including
> > CPU hotplug and host reboot, hence the quotes). Note, there's both a platform-
> > wide PM notifier and a per-CPU notifier...
> >
> > 4. Rename kvm_arch_post_init_vm() to e.g. kvm_arch_add_vm(), call it under
> > kvm_lock, and pass in kvm_usage_count.
> >
> > 5a. Drop cpus_hardware_enabled and drop the common hardware enable/disable code.
> >
> > or
> >
> > 5b. Expose kvm_hardware_enable_all() and/or kvm_hardware_enable() so that archs
> > don't need to implement their own error handling and per-CPU flags.
> >
> > I.e. give each architecture hooks to handle possible transition points, but otherwise
> > let arch code decide when and how to do hardware enabling/disabling.
> >
> > I'm very tempted to vote for (5a); x86 is the only architecture has an error path
> > in kvm_arch_hardware_enable(), and trying to get common code to play nice with arm's
> > kvm_arm_hardware_enabled logic is probably going to be weird.
> >

I ended up with (5a) with the following RFC patches.
https://lore.kernel.org/kvm/cover.1660974106.git.isaku.yamahata@xxxxxxxxx/T/#m0239e7800b66174b49c5b1049462aad50293a994


> > E.g. if we can get the back half kvm_create_vm() to look like the below, then arch
> > code can enable hardware during kvm_arch_add_vm() if the existing count is zero
> > without generic KVM needing to worry about when hardware needs to be enabled and
> > disabled.
> >
> > r = kvm_arch_init_vm(kvm, type);
> > if (r)
> > goto out_err_no_arch_destroy_vm;
> >
> > r = kvm_init_mmu_notifier(kvm);
> > if (r)
> > goto out_err_no_mmu_notifier;
> >
> > /*
> > * When the fd passed to this ioctl() is opened it pins the module,
> > * but try_module_get() also prevents getting a reference if the module
> > * is in MODULE_STATE_GOING (e.g. if someone ran "rmmod --wait").
> > */
> > if (!try_module_get(kvm_chardev_ops.owner)) {
> > r = -ENODEV;
> > goto out_err;
> > }
> >
> > mutex_lock(&kvm_lock);
> > cpus_read_lock();
> > r = kvm_arch_add_vm(kvm, kvm_usage_count);
>
> Holding cpus_read_lock() here implies CPU hotplug cannot happen during
> kvm_arch_add_vm(). This needs a justification/comment to explain why.  
>
> Also, assuming we have a justification, since (based on your description above)
> arch _may_ choose to enable hardware within it, but it is not a _must_. So
> maybe remove cpus_read_lock() here and let kvm_arch_add_vm() to decide whether
> to use it?

For now, I put locking outside of kvm_arch_{add, del}_vm(). But I haven't looked
at non-x86 arch. Probably arm code has its preference for its implementation.
--
Isaku Yamahata <isaku.yamahata@xxxxxxxxx>