Re: [PATCH] x86/PCI: Use MMCONFIG by default for KVM guests
From: Michael S. Tsirkin
Date: Wed Jul 22 2020 - 06:59:43 EST
On Wed, Jul 22, 2020 at 02:15:13AM +0200, Julia Suvorova wrote:
> Scanning for PCI devices at boot takes a long time for KVM guests. It
> can be reduced if KVM will handle all configuration space accesses for
> non-existent devices without going to userspace [1]. But for this to
> work, all accesses must go through MMCONFIG.
> This change allows to use pci_mmcfg as raw_pci_ops for 64-bit KVM
> guests making MMCONFIG the default access method.
>
> [1] https://lkml.org/lkml/2020/5/14/936
>
> Signed-off-by: Julia Suvorova <jusual@xxxxxxxxxx>
Thanks for the patch!
Some comments:
I guess the point is that on KVM, MMIO accesses of mmcfg are
faster than two accesses needed for classic access - is that right?
Worth mentioning in the commit log.
> ---
> arch/x86/pci/direct.c | 5 +++++
> arch/x86/pci/mmconfig_64.c | 3 +++
> 2 files changed, 8 insertions(+)
>
> diff --git a/arch/x86/pci/direct.c b/arch/x86/pci/direct.c
> index a51074c55982..8ff6b65d8f48 100644
> --- a/arch/x86/pci/direct.c
> +++ b/arch/x86/pci/direct.c
> @@ -6,6 +6,7 @@
> #include <linux/pci.h>
> #include <linux/init.h>
> #include <linux/dmi.h>
> +#include <linux/kvm_para.h>
> #include <asm/pci_x86.h>
>
> /*
> @@ -264,6 +265,10 @@ void __init pci_direct_init(int type)
> {
> if (type == 0)
> return;
> +
> + if (raw_pci_ext_ops && kvm_para_available())
> + return;
> +
> printk(KERN_INFO "PCI: Using configuration type %d for base access\n",
> type);
> if (type == 1) {
> diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c
> index 0c7b6e66c644..9eb772821766 100644
> --- a/arch/x86/pci/mmconfig_64.c
> +++ b/arch/x86/pci/mmconfig_64.c
> @@ -10,6 +10,7 @@
> #include <linux/init.h>
> #include <linux/acpi.h>
> #include <linux/bitmap.h>
> +#include <linux/kvm_para.h>
> #include <linux/rcupdate.h>
> #include <asm/e820/api.h>
> #include <asm/pci_x86.h>
> @@ -122,6 +123,8 @@ int __init pci_mmcfg_arch_init(void)
> }
>
> raw_pci_ext_ops = &pci_mmcfg;
> + if (kvm_para_available())
> + raw_pci_ops = &pci_mmcfg;
>
> return 1;
> }
The issue with anything like this is that it breaks if we ever do
some config accesses that affect mmconfig, e.g. to move it, or if
disabling or sizing BARs on some device (e.g. the PCI bridge)
also disables MMCFG.
I guess the explanation for why this works with QEMU is basically
that at least on QEMU right now disabling memory on the root
device does not disable MMCFG, and linux does not bother
tweaking MMCFG range set up by the bios.
Some suggestions:
1. It's worth mentioning all this in the commit log.
2. How do we know the above will always be correct?
Something like checking the ID of the root might be a good idea.
And given we know the ID, we can also make sure we don't
disable MMCFG. Does this make sense?
3. Another idea: how about preferring pcbios on kvm instead? That can do
what's appropriate for the platform ...
> --
> 2.25.4