Re: [PATCH v3 4/4] virtio_net: disable cb aggressively

From: Laurent Vivier
Date: Mon Jan 16 2023 - 08:42:44 EST


Hi Michael,

On 5/26/21 10:24, Michael S. Tsirkin wrote:
There are currently two cases where we poll TX vq not in response to a
callback: start xmit and rx napi. We currently do this with callbacks
enabled which can cause extra interrupts from the card. Used not to be
a big issue as we run with interrupts disabled but that is no longer the
case, and in some cases the rate of spurious interrupts is so high
linux detects this and actually kills the interrupt.

Fix up by disabling the callbacks before polling the tx vq.

Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx>
---
drivers/net/virtio_net.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c29f42d1e04f..a83dc038d8af 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
return;
if (__netif_tx_trylock(txq)) {
- free_old_xmit_skbs(sq, true);
+ do {
+ virtqueue_disable_cb(sq->vq);
+ free_old_xmit_skbs(sq, true);
+ } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
netif_tx_wake_queue(txq);
@@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
bool kick = !netdev_xmit_more();
bool use_napi = sq->napi.weight;
+ unsigned int bytes = skb->len;
/* Free up any pending old buffers before queueing new ones. */
- free_old_xmit_skbs(sq, false);
+ do {
+ if (use_napi)
+ virtqueue_disable_cb(sq->vq);
- if (use_napi && kick)
- virtqueue_enable_cb_delayed(sq->vq);
+ free_old_xmit_skbs(sq, false);
+
+ } while (use_napi && kick &&
+ unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
/* timestamp packet in software */
skb_tx_timestamp(skb);

This patch seems to introduce a problem with QEMU connected to passt using netdev stream backend.

When I run an iperf3 test I get after 1 or 2 seconds of test:

[ 254.035559] NETDEV WATCHDOG: ens3 (virtio_net): transmit queue 0 timed out
...
[ 254.060962] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 8856000 usecs ago
[ 259.155150] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1, name: output.0, 13951000 usecs ago

In QEMU, I can see in virtio_net_tx_bh() the function virtio_net_flush_tx() has flushed all the queue entries and re-enabled the queue notification with virtio_queue_set_notification() and tries to flush again the queue and as it is empty it does nothing more and then rely on a kernel notification to re-enable the bottom half function. As this notification never comes the queue is stuck and kernel add entries but QEMU doesn't remove them:

2812 static void virtio_net_tx_bh(void *opaque)
2813 {
...
2833 ret = virtio_net_flush_tx(q);

-> flush the queue and ret is not an error and not n->tx_burst (that would re-schedule the function)

...
2850 virtio_queue_set_notification(q->tx_vq, 1);

-> re-enable the queue notification

2851 ret = virtio_net_flush_tx(q);
2852 if (ret == -EINVAL) {
2853 return;
2854 } else if (ret > 0) {
2855 virtio_queue_set_notification(q->tx_vq, 0);
2856 qemu_bh_schedule(q->tx_bh);
2857 q->tx_waiting = 1;
2858 }

-> ret is 0, exit the function without re-scheduling the function.
...
2859 }

If I revert this patch in the kernel (a7766ef18b33 ("virtio_net: disable cb aggressively")), it works fine.

How to reproduce it:

I start passt (https://passt.top/passt):

passt -f

and then QEMU

qemu-system-x86_64 ... --netdev stream,id=netdev0,server=off,addr.type=unix,addr.path=/tmp/passt_1.socket -device virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0

Host side:

sysctl -w net.core.rmem_max=134217728
sysctl -w net.core.wmem_max=134217728
iperf3 -s

Guest side:

sysctl -w net.core.rmem_max=536870912
sysctl -w net.core.wmem_max=536870912

ip link set dev $DEV mtu 256
iperf3 -c $HOST -t10 -i0 -Z -P 8 -l 1M --pacing-timer 1000000 -w 4M

Any idea of what is the problem?

Thanks,
Laurent