You don't gain simplicity by adding things.
But you are failing to account for the fact that we still have to add
something for PCI if we go with something like the in-kernel model. Its
nice for the userspace side because a) it was already in qemu, and b) we
need it for proper guest support. But we don't presumably have it for
this new thing, so something has to be created (unless this support is
somehow already there and I don't know it?)
Optimization:
Most of PCI (in our context) deals with configuration. So removing it
doesn't optimize anything, unless you're counting hotplugs-per-second
or something.
Most, but not all ;) (Sorry, you left the window open on that one).
What about IRQ routing?
What if I want to coalesce interrupts to
minimize injection overhead? How do I do that in PCI?
How do I route those interrupts in an arbitrarily nested fashion, say,
to a guest userspace?
What about scale? What if Herbet decides to implement a 2048 ring MQ
device ;) Theres no great way to do that in x86 with PCI, yet I can do
it in vbus. (And yes, I know, this is ridiculous..just wanting to get
you thinking)
There is be no problem supporting an in-kernel host virtio endpointWell, thats not really true. If the device is a PCI device, there is
with the existing guest/host ABI. Nothing in the ABI assumes the host
endpoint is in userspace. Nothing in the implementation requires us
to move any of the PCI stuff into the kernel.
*some* stuff that has to go into the kernel. Not an ICH model or
anything, but at least an ability to interact with userspace for
config-space changes, etc.
To avoid reiterating, please be specific about these advantages.We are both reading the same thread, right?
Last time we measured, hypercall overhead was the same as pioNot on my woodcrests last time I looked, but I'll check again.
overhead. Both vmx and svm decode pio completely (except for string
pio ...)
True, PCI interrupts suck. But this was fixed with MSI. Why fix it
again?
As I stated, I don't like the constraints in place even by MSI (though
that is definately a step in the right direction).
With vbus I can have a device that has an arbitrary number of shm
regions (limited by memory, of course),
each with an arbitrarily routed
signal path that is limited by a u64, even on x86.
Each region can be
signaled bidirectionally and masked with a simple local memory write. They can be declared on the fly, allowing for the easy expression of
things like nested devices or or other dynamic resources. The can be
routed across various topologies, such as IRQs or posix signals, even
across multiple hops in a single path.
How do I do that in PCI?
What does masking an interrupt look like?
Again, for the nested case?
Interrupt acknowledgment cycles?
Well, first of all: Not really.One of my primary design objectives with vbus was to a) reduce theNone of these require vbus. They can all be done with PCI.
signaling as much as possible, and b) reduce the cost of signaling. That is why I do things like use explicit hypercalls, aggregated
interrupts, bidir napi to mitigate signaling, the shm_signal::pending
mitigation, and avoiding going to userspace by running in the kernel.
All of these things together help to form what I envision would be a
maximum performance transport. Not all of these tricks are
interdependent (for instance, the bidir + full-duplex threading that I
do can be done in userspace too, as discussed). They are just the
collective design elements that I think we need to make a guest perform
very close to its peak. That is what I am after.
Second of all, even if you *could* do
this all with PCI, its not really PCI anymore. So the question I have
is: whats the value in still using it? For the discovery? Its not very
hard to do discovery. I wrote that whole part in a few hours and it
worked the first time I ran it.
What about that interrupt model I keep talking about? How do you work
around that? How do I nest these to support bypass?
What constraints? Please be specific.
Avi, I have been. Is this an exercise to see how much you can get me to
type? ;)
I'm not saying anything about what the advantages are worth and how
they compare to the cost. I'm asking what are the advantages. Please
don't just assert them into existence.
Thats an unfair statement, Avi. Now I would say you are playing word-games.
All of this overhead is incurred at configuration time. All the
complexity already exists
So you already have the ability to represent PCI devices that are in the
kernel? Is this the device-assignment infrastructure? Cool! Wouldn't
this still need to be adapted to work with software devices? If not,
then I take back the statements that they both add more host code and
agree that vbus is simply the one adding more.
so we gain nothing by adding a competing implementation. And making
the guest complex in order to simplify the host is a pretty bad
tradeoff considering we maintain one host but want to support many
guests.
It's good to look forward, but in the vbus-dominated universe, what do
we have that we don't have now? Besides simplicity.
A unified framework for declaring virtual resources directly in the
kernel, yet still retaining the natural isolation that we get in
userspace.
The ability to support guests that don't have PCI.
The
ability to support things that are not guests.
The ability to support
things that are not supported by PCI, like less hardware-centric signal
path routing.
The ability to signal across more than just IRQs.
The
ability for nesting (e.g. guest-userspace talking to host-kernel, etc).
I recognize that this has no bearing on whether you, or anyone else
cares about these features. But it certainly has features beyond what
he have with PCI, and I hope that is clear now.
Ive already said this is low on my list, but it could always be added ifThat's unreasonable. Windows is an important workload.
someone cares that much
Well, this is all GPL, right. I mean, was KVM 100% complete when it was
proposed? Accepted? I am hoping to get some help building the parts of
this infrastructure from anyone interested in the community. If Windows
support is truly important and someone cares, it will get built soon enough.
I pushed it out now because I have enough working to be useful in of
itself and to get a review. But its certainly not done.
Of course we need to. RHEL 4/5 and their equivalents will live for a
long time as guests. Customers will expect good performance.
Okay, easy enough from my perspective. However, I didn't realize it was
very common to backport new features to enterprise distros like this. I
have a sneaking suspicion we wouldn't really need to worry about this as
the project managers for those products would probably never allow it. But in the event that it was necessary, I think it wouldn't be horrendous.
So does virtio also do demand loading in the backend?Given that it's entirely in userspace, yes.
Ah, right. How does that work our of curiosity? Do you have to do a
syscall for every page you want to read?
Hmm. I supposeThis is negotiable :) I won't insist on it, only strongly recommend
we could do this, but it will definitely affect the performance
somewhat. I was thinking that the pages needed for the basic shm
components should be minimal, so this is a good tradeoff to vmap them in
and only demand load the payload.
it. copy_to_user() should be pretty fast.
It probably is, but generally we cant use it since we are not in the
same context when we need to do the copy (copy_to/from_user assume
"current" is proper).
Thats ok, there are ways to do what you request
without explicitly using c_t_u().