Re: [RFC v2] virtio: support packed ring

From: Michael S. Tsirkin
Date: Mon Apr 23 2018 - 21:30:13 EST


On Tue, Apr 24, 2018 at 09:16:38AM +0800, Tiwei Bie wrote:
> On Tue, Apr 24, 2018 at 04:05:07AM +0300, Michael S. Tsirkin wrote:
> > On Tue, Apr 24, 2018 at 08:54:52AM +0800, Jason Wang wrote:
> > >
> > >
> > > On 2018å04æ23æ 17:29, Tiwei Bie wrote:
> > > > On Mon, Apr 23, 2018 at 01:42:14PM +0800, Jason Wang wrote:
> > > > > On 2018å04æ01æ 22:12, Tiwei Bie wrote:
> > > > > > Hello everyone,
> > > > > >
> > > > > > This RFC implements packed ring support for virtio driver.
> > > > > >
> > > > > > The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented
> > > > > > by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html
> > > > > > Minor changes are needed for the vhost code, e.g. to kick the guest.
> > > > > >
> > > > > > TODO:
> > > > > > - Refinements and bug fixes;
> > > > > > - Split into small patches;
> > > > > > - Test indirect descriptor support;
> > > > > > - Test/fix event suppression support;
> > > > > > - Test devices other than net;
> > > > > >
> > > > > > RFC v1 -> RFC v2:
> > > > > > - Add indirect descriptor support - compile test only;
> > > > > > - Add event suppression supprt - compile test only;
> > > > > > - Move vring_packed_init() out of uapi (Jason, MST);
> > > > > > - Merge two loops into one in virtqueue_add_packed() (Jason);
> > > > > > - Split vring_unmap_one() for packed ring and split ring (Jason);
> > > > > > - Avoid using '%' operator (Jason);
> > > > > > - Rename free_head -> next_avail_idx (Jason);
> > > > > > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> > > > > > - Some other refinements and bug fixes;
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@xxxxxxxxx>
> > > > > > ---
> > > > > > drivers/virtio/virtio_ring.c | 1094 +++++++++++++++++++++++++++++-------
> > > > > > include/linux/virtio_ring.h | 8 +-
> > > > > > include/uapi/linux/virtio_config.h | 12 +-
> > > > > > include/uapi/linux/virtio_ring.h | 61 ++
> > > > > > 4 files changed, 980 insertions(+), 195 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > index 71458f493cf8..0515dca34d77 100644
> > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > @@ -58,14 +58,15 @@
> > > > > [...]
> > > > >
> > > > > > +
> > > > > > + if (vq->indirect) {
> > > > > > + u32 len;
> > > > > > +
> > > > > > + desc = vq->desc_state[head].indir_desc;
> > > > > > + /* Free the indirect table, if any, now that it's unmapped. */
> > > > > > + if (!desc)
> > > > > > + goto out;
> > > > > > +
> > > > > > + len = virtio32_to_cpu(vq->vq.vdev,
> > > > > > + vq->vring_packed.desc[head].len);
> > > > > > +
> > > > > > + BUG_ON(!(vq->vring_packed.desc[head].flags &
> > > > > > + cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
> > > > > It looks to me spec does not force to keep VRING_DESC_F_INDIRECT here. So we
> > > > > can safely remove this BUG_ON() here.
> > > > >
> > > > > > + BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc));
> > > > > Len could be ignored for used descriptor according to the spec, so we need
> > > > > remove this BUG_ON() too.
> > > > Yeah, you're right! The BUG_ON() isn't right. I'll remove it.
> > > > And I think something related to this in the spec isn't very
> > > > clear currently.
> > > >
> > > > In the spec, there are below words:
> > > >
> > > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L272
> > > > """
> > > > In descriptors with VIRTQ_DESC_F_INDIRECT set VIRTQ_DESC_F_WRITE
> > > > is reserved and is ignored by the device.
> > > > """
> > > >
> > > > So when device writes back an used descriptor in this case,
> > > > device may not set the VIRTQ_DESC_F_WRITE flag as the flag
> > > > is reserved and should be ignored.
> > > >
> > > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L170
> > > > """
> > > > Element Length is reserved for used descriptors without the
> > > > VIRTQ_DESC_F_WRITE flag, and is ignored by drivers.
> > > > """
> > > >
> > > > And this is the way how driver ignores the `len` in an used
> > > > descriptor.
> > > >
> > > > https://github.com/oasis-tcs/virtio-spec/blob/d4fec517dfcf/packed-ring.tex#L241
> > > > """
> > > > To increase ring capacity the driver can store a (read-only
> > > > by the device) table of indirect descriptors anywhere in memory,
> > > > and insert a descriptor in the main virtqueue (with \field{Flags}
> > > > bit VIRTQ_DESC_F_INDIRECT on) that refers to a buffer element
> > > > containing this indirect descriptor table;
> > > > """
> > > >
> > > > So the indirect descriptors in the table are read-only by
> > > > the device. And the only descriptor which is writeable by
> > > > the device is the descriptor in the main virtqueue (with
> > > > Flags bit VIRTQ_DESC_F_INDIRECT on). So if we ignore the
> > > > `len` in this descriptor, we won't be able to get the
> > > > length of the data written by the device.
> > > >
> > > > So I think the `len` in this descriptor will carry the
> > > > length of the data written by the device (if the buffers
> > > > are writable to the device) even if the VIRTQ_DESC_F_WRITE
> > > > isn't set by the device. How do you think?
> > >
> > > Yes I think so. But we'd better need clarification from Michael.
> >
> > I think if you use a descriptor, and you want to supply len
> > to guest, you set VIRTQ_DESC_F_WRITE in the used descriptor.
> > Spec also says you must not set VIRTQ_DESC_F_INDIRECT then.
> > If that's a problem we could look at relaxing that last requirement -
> > does driver want INDIRECT in used descriptor to match
> > the value in the avail descriptor for some reason?
>
> For indirect, driver needs some way to get the length
> of the data written by the driver. And the descriptors
> in the indirect table is read-only, so the only place
> device could put this value is the descriptor with the
> VIRTQ_DESC_F_INDIRECT flag set.

when writing out used descriptor, device should set VIRTQ_DESC_F_WRITE
(and clear VIRTQ_DESC_F_INDIRECT).

> >
> > > >
> > > >
> > > > > The reason is we don't touch descriptor ring in the case of split, so
> > > > > BUG_ON()s may help there.
> > > > >
> > > > > > +
> > > > > > + for (j = 0; j < len / sizeof(struct vring_packed_desc); j++)
> > > > > > + vring_unmap_one_packed(vq, &desc[j]);
> > > > > > +
> > > > > > + kfree(desc);
> > > > > > + vq->desc_state[head].indir_desc = NULL;
> > > > > > + } else if (ctx) {
> > > > > > + *ctx = vq->desc_state[head].indir_desc;
> > > > > > + }
> > > > > > +
> > > > > > +out:
> > > > > > + return vq->desc_state[head].num;
> > > > > > +}
> > > > > > +
> > > > > > +static inline bool more_used_split(const struct vring_virtqueue *vq)
> > > > > > {
> > > > > > return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
> > > > > > }
> > > > > > +static inline bool more_used_packed(const struct vring_virtqueue *vq)
> > > > > > +{
> > > > > > + u16 last_used, flags;
> > > > > > + bool avail, used;
> > > > > > +
> > > > > > + if (vq->vq.num_free == vq->vring_packed.num)
> > > > > > + return false;
> > > > > > +
> > > > > > + last_used = vq->last_used_idx;
> > > > > > + flags = virtio16_to_cpu(vq->vq.vdev,
> > > > > > + vq->vring_packed.desc[last_used].flags);
> > > > > > + avail = flags & VRING_DESC_F_AVAIL(1);
> > > > > > + used = flags & VRING_DESC_F_USED(1);
> > > > > > +
> > > > > > + return avail == used;
> > > > > > +}
> > > > > This looks interesting, spec said:
> > > > >
> > > > > "
> > > > > Thus VIRTQ_DESC_F_AVAIL and VIRTQ_DESC_F_USED bits are different for an
> > > > > available descriptor and
> > > > > equal for a used descriptor.
> > > > > Note that this observation is mostly useful for sanity-checking as these are
> > > > > necessary but not sufficient
> > > > > conditions - for example, all descriptors are zero-initialized. To detect
> > > > > used and available descriptors it is
> > > > > possible for drivers and devices to keep track of the last observed value of
> > > > > VIRTQ_DESC_F_USED/VIRTQ_-
> > > > > DESC_F_AVAIL. Other techniques to detect
> > > > > VIRTQ_DESC_F_AVAIL/VIRTQ_DESC_F_USED bit changes
> > > > > might also be possible.
> > > > > "
> > > > >
> > > > > So it looks to me it was not sufficient, looking at the example codes in
> > > > > spec, do we need to track last seen used_wrap_counter here?
> > > > I don't think we have to track used_wrap_counter in
> > > > driver. There was a discussion on this:
> > > >
> > > > https://lists.oasis-open.org/archives/virtio-dev/201802/msg00177.html
> > > >
> > > > And after that, below sentence was added (it's also
> > > > in the above words you quoted):
> > > >
> > > > """
> > > > Other techniques to detect
> > > > VIRTQ_DESC_F_AVAIL/VIRTQ_DESC_F_USED bit changes
> > > > might also be possible.
> > > > """
> > > >
> > > > Best regards,
> > > > Tiwei Bie
> > >
> > > I see, the extra condition "if (vq->vq.num_free == vq->vring_packed.num)"
> > > help in this case.
> > >
> > > Thanks
> >
> > I still think tracking a wrap counter is better.
>
> >From my understanding, wrap counter is only needed when
> one side just want to update parts of the status bit(s),
> it's something like the "report status" or "write back"
> feature [1] in the hardware NIC. And in the driver, all
> the status must be updated, and that's why I don't want
> to track the usedwrap counter.
>
> [1] https://github.com/btw616/dpdk-virtio1.1/commit/ca837865bd10
>
> Best regards,
> Tiwei Bie
>
> >
> > > >
> > > > > Thanks