Re: vbus design points: shm and shm-signals

From: Gregory Haskins
Date: Mon Aug 24 2009 - 16:00:40 EST

Hi Anthony,

Anthony Liguori wrote:
> Gregory Haskins wrote:
>> Gregory Haskins wrote:
>>> Ingo Molnar wrote:
>>>> We all love faster code and better management interfaces and tons of
>>>> your prior patches got accepted by Avi. This time you didnt even
>>>> _try_ to improve virtio.
>>> Im sorry, but you are mistaken:
>> BTW: One point that I forgot to point out in this most recent thread
>> that I am particularly proud of here is the design of the vbus
>> shared-memory model. Despite some claims to the contrary; not only is
>> it possible to improve virtio with vbus (as evident by the patch
>> referenced above)...I specifically designed vbus with virtio
>> considerations in mind from the start! In fact, the design is conducive
>> to accelerating a variety of other models as well. Read on for details.
>> Vbus was designed it to be _agnostic_ to the shm algorithm in general.
>> This allows you to, of course, run ring algorithms (such as virtqueues,
>> or IOQs), but really any other designs as well, such as shared-tables,
>> etc.
>> A guest driver sees the following interface:
>> struct vbus_device_proxy_ops {
>> int (*open)(struct vbus_device_proxy *dev, int version, int flags);
>> int (*close)(struct vbus_device_proxy *dev, int flags);
>> int (*shm)(struct vbus_device_proxy *dev, int id, int prio,
>> void *ptr, size_t len,
>> struct shm_signal_desc *sigdesc, struct shm_signal **signal,
>> int flags);
>> int (*call)(struct vbus_device_proxy *dev, u32 func,
>> void *data, size_t len, int flags);
>> void (*release)(struct vbus_device_proxy *dev);
>> };
>> note the ops->shm() method. This allows the driver to register some
>> arbitrary pointer (ptr, len) with the host, optionally embedding a
>> shm_signal_desc object in the memory. If "sigdesc" is non-null, the
>> connector will allocate and return a fully formed shm_signal object in
>> **signal.
> Fundamentally, how is this different than the virtio->add_buf concept?

From my POV, they are at different levels. Calling vbus->shm() is for
establishing a shared-memory region including routing the memory and
signal-path contexts. You do this once at device init time, and then
run some algorithm on top (such as a virtqueue design).

virtio->add_buf() OTOH, is a run-time function. You do this to modify
the shared-memory region that is already established at init time by
something like vbus->shm(). You would do this to queue a network
packet, for instance.

That said, shm-signal's closest analogy to virtio would be vq->kick(),
vq->callback(), vq->enable_cb(), and vq->disable_cb(). The difference
is that the notification mechanism isn't associated with a particular
type of shared-memory construct (such as a virt-queue), but instead can
be used with any shared-mem algorithm (at least, if I designed it properly).

The closest analogy for vbus->shm() to virtio would be
vdev->config->find_vqs(). Again, the difference is that the algorithm
(ring, etc) is not dictated by the call. You then overlay something
like virtqueue on top.

> virtio provides a mechanism to register scatter/gather lists, associate
> a handle with them, and provides a mechanism for retrieving notification
> that the buffer has been processed.

Yes, and I agree this is very useful for many/most algorithms...but not
all. Sometimes you don't want ring-like semantics, but instead want
something like an idempotent table. (Think of things like interrupt
controllers, timers, etc).

Rings, of course, have a trait that all updates are retained in fifo
order. For many things (e.g. network, block io, etc), this is exactly
what you want. If I say "send packet X" now, and "send packet Y" later,
I want the system to do both (and perhaps in that order), so a ring
scheme works well.

However, sometimes you may want to say "time is now X", and later "time
is now Y". The update value of 'X' is technically superseded by Y and
is stale. But a ring may allow both to exist in-flight within the shm
simultaneously if the recipient (guest or host) is lagging, and the X
may be processed even though its data is now irrelevant. What we really
want is the transform of X->Y to invalidate anything else in flight so
that only Y is visible.

So in a case like this, we may want a different algorithm. Something
like a table which always contains the current/valid value, and a way to
signal in both directions when something interesting happens to that data.

If you think about it, a ring is a superset of this construct...the ring
meta-data is the "shared-table" (e.g. HEAD ptr, TAIL ptr, COUNT, etc).
So we start by introducing the basic shm concept, and allow the next
layer (virtio/virtqueue) in the stack to refine it for its needs.

> vbus provides a mechanism to register a single buffer with an integer
> handle, priority, and a signaling mechanism.

Again, I think we are talking about two different layers. You would
never put entries into a virtio-ring of different priority. This
doesn't make sense, as they would just get linearized by the fifo.

What you *would* do is possibly make multiple virtqueues, each with a
different priority (for instance, say 8-rx queues for virtio-net).

> So virtio provides builtin support for scatter/gathers whereas vbus
> models priority. But fundamentally, they seem like almost identical
> concepts.

I would say that virtqueue and IOQ are a much closer analogy in terms of
comparison at the scatter-gather level. The virtio device model itself
is similar to a vbus device-model except its oriented towards the
virtqueue ring design. In addition, a big part of vbus is also what
happens _behind_ the device model.

> If we added priority to virtio->add_buf, would it be equivalent in your
> mind functionally speaking?

As indicated above, this wouldn't be sane. A better design (IMO) is to
use a ring per priority.

> What does one do with priority, btw?

There are, of course, many answers to that question. One particularly
trivial example is 802.1p networking. So, for instance, you can
classify and prioritize network traffic so that things like
control/timing packets are higher priority than best-effort HTTP. Doing
this "right" means you have end-to-end priority within the system (e.g.
your switch/fabric, nics, interrupt controllers, etc). Today, virt is
fairly far removed from being fully integrated in this sense, but the
vbus project is addressing this short-coming.


Kind Regards,

Attachment: signature.asc
Description: OpenPGP digital signature