I'll rephrase. What are the substantial benefits that this offers
over PCI?
Simplicity and optimization. You don't need most of the junk that comes
with PCI. Its all overhead and artificial constraints. You really only
need things like a handful of hypercall verbs and thats it.
I understand that. We keep getting wrapped around the axle on thisSecond of all, I want to use vbus for other things that do not speak PCIAnd virtio supports lguest and s390. virtio is not PCI specific.
natively (like userspace for instance...and if I am gleaning this
correctly, lguest doesnt either).
one. At some point in the discussion we were talking about supporting
the existing guest ABI without changing the guest at all. So while I
totally understand the virtio can work over various transports, I am
referring to what would be needed to have existing ABI guests work with
an in-kernel version. This may or may not be an actual requirement.
However, for the PC platform, PCI has distinct advantages. WhatTo reiterate: IMO simplicity and optimization. Its designed
advantages does vbus have for the PC platform?
specifically for PV use, which is software to software.
PCI sounds good at first, but I believe its a false economy. It wasNot a problem in practice.
designed, of course, to be a hardware solution, so it carries all this
baggage derived from hardware constraints that simply do not exist in a
pure software world and that have to be emulated. Things like the fixed
length and centrally managed PCI-IDs,
Perhaps, but its just one more constraint that isn't actually needed. Its like the cvs vs git debate. Why have it centrally managed when you
don't technically need it. Sure, centrally managed works, but I'd
rather not deal with it if there was a better option.
PIO config cycles, BARs,What are the problems with these?
pci-irq-routing, etc.
1) PIOs are still less efficient to decode than a hypercall vector. We
dont need to pretend we are hardware..the guest already knows whats
underneath them. Use the most efficient call method.
2) BARs? No one in their right mind should use an MMIO BAR for PV. :)
The last thing we want to do is cause page faults here. Don't use them,
period. (This is where something like the vbus::shm() interface comes in)
3) pci-irq routing was designed to accommodate etch constraints on a
piece of silicon that doesn't actually exist in kvm. Why would I want
to pretend I have PCI A,B,C,D lines that route to a pin on an IOAPIC? Forget all that stuff and just inject an IRQ directly. This gets much
better with MSI, I admit, but you hopefully catch my drift now.
One of my primary design objectives with vbus was to a) reduce the
signaling as much as possible, and b) reduce the cost of signaling. That is why I do things like use explicit hypercalls, aggregated
interrupts, bidir napi to mitigate signaling, the shm_signal::pending
mitigation, and avoiding going to userspace by running in the kernel. All of these things together help to form what I envision would be a
maximum performance transport. Not all of these tricks are
interdependent (for instance, the bidir + full-duplex threading that I
do can be done in userspace too, as discussed). They are just the
collective design elements that I think we need to make a guest perform
very close to its peak. That is what I am after.
You are right, its not strictly necessary to work. Its just presents
the opportunity to optimize as much as possible and to move away from
legacy constraints that no longer apply. And since PVs sole purpose is
about optimization, I was not really interested in going "half-way".
We need a positive advantage, we don't do things just because we can
(and then lose the real advantages PCI has).
Agreed, but I assert there are advantages. You may not think they
outweigh the cost, and thats your prerogative, but I think they are
still there nonetheless.
If we insist that PCI is the only interface we can support and we wantIf we go for a kernel solution, a hybrid solution is the best IMO. I
to do something, say, in the kernel for instance, we have to have either
something like the ICH model in the kernel (and really all of the pci
chipset models that qemu supports), or a hacky hybrid userspace/kernel
solution. I think this is what you are advocating, but im sorry. IMO
that's just gross and unecessary gunk.
have no idea what's wrong with it.
Its just that rendering these objects as PCI is overhead that you don't
technically need. You only want this backwards compat because you don't
want to require a new bus-driver in the guest, which is a perfectly
reasonable position to take. But that doesn't mean it isn't a
compromise. You are trading more complexity and overhead in the host
for simplicity in the guest. I am trying to clean up this path for
looking forward.
The guest would discover and configure the device using normal PCIYes, today (use "ln -s" in configfs to map a device to a bus, and the
methods. Qemu emulates the requests, and configures the kernel part
using normal Linux syscalls. The nice thing is, kvm and the kernel
part don't even know about each other, except for a way for hypercalls
to reach the device and a way for interrupts to reach kvm.
Lets stop beating around theDoes it support device hotplug and hotunplug?
bush and just define the 4-5 hypercall verbs we need and be done with
it. :)
FYI: The guest support for this is not really *that* much code IMO.
drivers/vbus/proxy/Makefile | 2
drivers/vbus/proxy/kvm.c | 726 +++++++++++++++++
guest will see the device immediately)
Can vbus interrupts be load balanced by irqbalance?
Yes (tho support for the .affinity verb on the guests irq-chip is
currently missing...but the backend support is there)
Can guest userspace enumerate devices?
Yes, it presents as a standard LDM device in things like /sys/bus/vbus_proxy
Module autoloading support?
Yes
pxe booting?No, but this is something I don't think we need for now. If it was
really needed it could be added, I suppose. But there are other
alternatives already, so I am not putting this high on the priority
list. (For instance you can chose to not use vbus, or you can use
--kernel, etc).
Plus a port to Windows,
Ive already said this is low on my list, but it could always be added if
someone cares that much
enerprise Linux distros based on 2.6.dead
Thats easy, though there is nothing that says we need to. This can be a
2.6.31ish thing that they pick up next time.
As a matter of fact, a new bus was developed recently called PCI
express. It uses new slots, new electricals, it's not even a bus
(routers + point-to-point links), new everything except that the
software model was 1000000000000% compatible with traditional PCI. That's how much people are afraid of the Windows ABI.
Come on, Avi. Now you are being silly. So should the USB designers
have tried to make it look like PCI too? Should the PCI designers have
tried to make it look like ISA? :) Yes, there are advantages to making
something backwards compatible. There are also disadvantages to
maintaining that backwards compatibility.
Let me ask you this: If you had a clean slate and were designing a
hypervisor and a guest OS from scratch: What would you make the bus
look like?
virtio-net knows nothing about PCI. If you have a problem with PCI,Can virtio-net use a different backend other than virtio-pci? Cool! I
write virtio-blah for a new bus.
will look into that. Perhaps that is what I need to make this work
smoothly.
I think you're integrating too tightly with kvm, which is likely to
cause problems when kvm evolves. The way I'd do it is:
- drop all mmu integration; instead, have your devices maintain their
own slots layout and use copy_to_user()/copy_from_user() (or
get_user_pages_fast()).
- never use vmap like structures for more than the length of a request
So does virtio also do demand loading in the backend?
Hmm. I suppose
we could do this, but it will definitely affect the performance
somewhat. I was thinking that the pages needed for the basic shm
components should be minimal, so this is a good tradeoff to vmap them in
and only demand load the payload.
I think virtio can be used for much of the same things. There'sRight. Virtio and ioq overlap, and they do so primarily because I
nothing in virtio that implies guest/host, or pci, or anything else. It's similar to your shm/signal and ring abstractions except virtio
folds them together. Is this folding the main problem?
needed a ring that was compatible with some of my design ideas, yet I
didnt want to break the virtio ABI without a blessing first. If the
signaling was not folded in virtio, that would be a first great step. I
am not sure if there would be other areas to address as well.
As far as I can tell, everything around it just duplicates existing
infrastructure (which may be old and crusty, but so what) without
added value.
I am not sure what you refer to with "everything around it". Are you
talking about the vbus core?