Re: [PATCH] MMIO should have more priority then IO
From: Matthew Wilcox
Date: Fri Jul 08 2022 - 08:57:31 EST
On Fri, Jul 08, 2022 at 05:56:07AM +0000, Ajay Kaher wrote:
>
> On 28/06/22, 11:39 PM, "Bjorn Helgaas" <helgaas@xxxxxxxxxx> wrote:
> > [+cc Matthew]
> >
> > On Tue, Jun 28, 2022 at 09:59:21PM +0530, Ajay Kaher wrote:
> >> Port IO instructions (PIO) are less efficient than MMIO (memory
> >> mapped I/O). They require twice as many PCI accesses and PIO
> >> instructions are serializing. As a result, MMIO should be preferred
> >> when possible over PIO.
> >>
> >> Bare metal test result
> >> 1 million reads using raw_pci_read() took:
> >> PIO: 0.433153 Sec.
> >> MMIO: 0.268792 Sec.
> >>
> >> Virtual Machine test result
> >> 1 hundred thousand reads using raw_pci_read() took:
> >> PIO: 12.809 Sec.
> >> MMIO: took 8.517 Sec.
While this is true, we're talking about config space accesses. These
should be rare. What workload does this make any measurable difference
to?
And looking at the results above, it's not so much the PIO vs MMIO
that makes a difference, it's the virtualisation. A mmio access goes
from 269ns to 85us. Rather than messing around with preferring MMIO
over PIO for config space, having an "enlightenment" to do config
space accesses would be a more profitable path.
> >> Signed-off-by: Ajay Kaher <akaher@xxxxxxxxxx>
> >> ---
> >> arch/x86/pci/common.c | 8 ++++----
> >> 1 files changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> >> index 3507f456f..0b3383d9c 100644
> >> --- a/arch/x86/pci/common.c
> >> +++ b/arch/x86/pci/common.c
> >> @@ -40,20 +40,20 @@ const struct pci_raw_ops *__read_mostly raw_pci_ext_ops;
> >> int raw_pci_read(unsigned int domain, unsigned int bus, unsigned int devfn,
> >> int reg, int len, u32 *val)
> >> {
> >> + if (raw_pci_ext_ops)
> >> + return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
> >> if (domain == 0 && reg < 256 && raw_pci_ops)
> >> return raw_pci_ops->read(domain, bus, devfn, reg, len, val);
> >> - if (raw_pci_ext_ops)
> >> - return raw_pci_ext_ops->read(domain, bus, devfn, reg, len, val);
> >> return -EINVAL;
> >
> > This organization of raw_pci_read() dates to b6ce068a1285 ("Change
> > pci_raw_ops to pci_raw_read/write"), by Matthew. Cc'd him for
> > comment, since I think he considered the ordering at the time.
>
> Thanks Bjorn for quick response.
>
> Matthew, b6ce068a1285 is old commit. It will be very helpful if you could
> provide some detail on ordering as Bjorn mentioned above.
Sorry for the delay; this came in while I was on holiday.
I'll note that b6ce068a1285 does _not_ cause any changes in whether
MMIO or PIO is used for accesses below 256 bytes. Yes, it introduces
this code, but it also removes lines like this:
- if (reg < 256)
- return pci_conf1_read(seg,bus,devfn,reg,len,value);
from the MMCONFIG accessors.
Those were introduced in a0ca99096094 by Ivan Kokshaysky. But if you
look further back in the file history, you can find all kinds of nasty
bugs; broken BIOSes, resource conflicts, bleh. You'd hope they've all
been fixed by now, but do you want to bet?
I still have a working machine here which hung when using MMCONFIG for all
accesses. The problem lay in, IIRC, the graphics BAR. When attempting
to size it (by writing all-ones to it and seeing what bits remained as
zero), it happens to overlay the MMCONFIG area. That meant that the
subsequent attempt to write to the BAR actually ended up being a write
to graphics memory ... and so did all subsequent MMCONFIG accesses.
In short, here be dragons, and you need to move very VERY carefully to
avoid breaking peoples machines.
We do have the possibility of a white-list approach. We can set
'raw_pci_ops' to NULL on machines which we're certain mmconfig works.
I still think you're better off having a special raw_pci_ops for vmware
than you are dinking about trying to save 50% of the time. If any of
this is really worth it at all, which I doubt.