Re: [PATCH v7 00/22] Support SDEI Virtualization

From: Paolo Bonzini
Date: Tue Jul 05 2022 - 03:44:35 EST


On 6/24/22 15:12, Marc Zyngier wrote:
- as far as I know, the core Linux/arm64 maintainers have no plan to
support APF. Without it, this is a pointless exercise. And even with
it, this introduces a Linux specific behaviour in an otherwise
architectural hypervisor (something I'm quite keen on avoiding)

Regarding non-architectural behavior, isn't that the same already for PTP? I understand that the PTP hypercall is a much smaller implementation than SDEI+APF, but it goes to show that KVM is already not "architectural".

There are other cases where paravirtualized solutions can be useful. PTP is one but there are more where KVM/ARM does not have a solution yet, for example lock holder preemption. Unless ARM (the company) has a way to receive input from developers and standardize the interface, similar to the RISC-V SIGs, vendor-specific hypercalls are a sad fact of life. It just happened that until now KVM/ARM hasn't seen much use in some cases (such as desktop virtualization) where overcommitted hosts are more common.

Async page faults per se are not KVM specific, in fact Linux supported them for the IBM s390 hypervisor long before KVM added support. They didn't exist on x86 and ARM, so the developers came up with a new hypercall API and for x86 honestly it wasn't great. For ARM we learnt from the mistakes and it seems to me that SDEI is a good match for the feature. If ARM wants to produce a standard interface for APF, whether based on SDEI or something else, we're all ears.

Regarding plans of core arm64 maintainers to support async page fault, can you provide a pointer to the discussion? I agree that if there's a hard NACK for APF for whatever reason, the whole host-side code is pointless (including SDEI virtualization); but I would like to read more about it.

- It gives an incentive to other hypervisor vendors to add random crap
to the Linux mm subsystem, which is even worse. At this stage, we
might as well go back to the Xen PV days altogether.

return -EGREGIOUS;

Since you mention hypervisor vendors and there's only one hypervisor in Linux, I guess you're not talking about the host mm/ subsystem (otherwise yeah, FOLL_NOWAIT is only used by KVM async page faults).

So I suppose you're talking about the guest, and then yeah, it sucks to have multiple hypervisors providing the same functionality in different ways (or multiple hypervisors providing different subsets of PV functionality). It happens on x86 with Hyper-V and KVM, and to a lesser extent Xen and VMware.

But again, KVM/ARM has already crossed that bridge with PTP support, and the guest needs exactly zero code in the Linux mm subsystem (both generic and arch-specific) to support asynchronous page faults. There are 20 lines of code in do_notify_resume(), and the rest is just SDEI gunk. Again, I would be happy to get a pointer to concrete objections from the Linux ARM64 maintainers. Maybe a different implementation is possible, I don't know.

In any case it's absolutely not comparable to Xen PV, and you know it.

- I haven't seen any of the KVM/arm64 users actually asking for the
APF horror, and the cloud vendors I directly asked had no plan to
use it, and not using it on their x86 systems either

Please define "horror" in more technical terms. And since this is the second time I'm calling you out on this, I'm also asking you to avoid hyperboles and similar rhetorical gimmicks in the future.

That said: Peter, Sean, Google uses or used postcopy extensively on GCE (https://dl.acm.org/doi/pdf/10.1145/3296975.3186415). If it doesn't use it on x86, do you have any insights on why?

- no performance data nor workloads that could help making an informed
decision have been disclosed, and the only argument in its favour
seems to be "but x86 has it" (hardly a compelling one)

Again this is just false, numbers have been posted (https://lwn.net/ml/linux-kernel/20210209050403.103143-1-gshan@xxxxxxxxxx/ was the first result that came up from a quick mailing list search). If they are not enough, please be more specific.

Thanks,

Paolo