Re: [PATCH net-next v3 00/13] virtio: support packed ring

From: Jason Wang
Date: Wed Nov 21 2018 - 08:46:21 EST



On 2018/11/21 äå8:42, Tiwei Bie wrote:
On Wed, Nov 21, 2018 at 07:20:27AM -0500, Michael S. Tsirkin wrote:
On Wed, Nov 21, 2018 at 06:03:17PM +0800, Tiwei Bie wrote:
Hi,

This patch set implements packed ring support in virtio driver.

A performance test between pktgen (pktgen_sample03_burst_single_flow.sh)
and DPDK vhost (testpmd/rxonly/vhost-PMD) has been done, I saw
~30% performance gain in packed ring in this case.
Thanks a lot, this is very exciting!
Dave, given the holiday, attempts to wrap up the 1.1 spec and the
patchset size I would very much appreciate a bit more time for
review. Say until Nov 28?

To make this patch set work with below patch set for vhost,
some hacks are needed to set the _F_NEXT flag in indirect
descriptors (this should be fixed in vhost):

https://lkml.org/lkml/2018/7/3/33
Could you pls clarify - do you mean it doesn't yet work with vhost
because of a vhost bug, and to test it with the linked patches
you had to hack in _F_NEXT? Because I do not see _F_NEXT
in indirect descriptors in this patch (which is fine).
Or did I miss it?
You didn't miss anything. :)

I think it's a small bug in vhost, which Jason may fix very
quickly, so I didn't post it. Below is the hack I used:


Good catch. I didn't notice the subtle difference since split ring requires for it.

Let me fix it in next version.

Thanks.



diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cd7e755484e3..42faea7d8cf8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -980,6 +980,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
unsigned int i, n, err_idx;
u16 head, id;
dma_addr_t addr;
+ int c = 0;
head = vq->packed.next_avail_idx;
desc = alloc_indirect_packed(total_sg, gfp);
@@ -1001,8 +1002,9 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
if (vring_mapping_error(vq, addr))
goto unmap_release;
- desc[i].flags = cpu_to_le16(n < out_sgs ?
- 0 : VRING_DESC_F_WRITE);
+ desc[i].flags = cpu_to_le16((n < out_sgs ?
+ 0 : VRING_DESC_F_WRITE) |
+ (++c == total_sg ? 0 : VRING_DESC_F_NEXT));
desc[i].addr = cpu_to_le64(addr);
desc[i].len = cpu_to_le32(sg->length);
i++;