Re: [PATCH net-next v5 8/9] xen-netback: Timeout packets in RX path

From: Zoltan Kiss
Date: Mon Jan 20 2014 - 19:25:44 EST


On 20/01/14 22:03, Wei Liu wrote:
On Mon, Jan 20, 2014 at 09:24:28PM +0000, Zoltan Kiss wrote:
@@ -557,12 +577,25 @@ void xenvif_disconnect(struct xenvif *vif)
void xenvif_free(struct xenvif *vif)
{
int i, unmap_timeout = 0;
+ /* Here we want to avoid timeout messages if an skb can be legitimatly
+ * stucked somewhere else. Realisticly this could be an another vif's
+ * internal or QDisc queue. That another vif also has this
+ * rx_drain_timeout_msecs timeout, but the timer only ditches the
+ * internal queue. After that, the QDisc queue can put in worst case
+ * XEN_NETIF_RX_RING_SIZE / MAX_SKB_FRAGS skbs into that another vif's
+ * internal queue, so we need several rounds of such timeouts until we
+ * can be sure that no another vif should have skb's from us. We are
+ * not sending more skb's, so newly stucked packets are not interesting
+ * for us here.
+ */
You beat me to this. Was about to reply to your other email. :-)

It's also worth mentioning that DIV_ROUND_UP part is merely estimation,
as you cannot possible know the maximum / miminum queue length of all
other vifs (as they can be changed during runtime). In practice most
users will stick with the default, but some advanced users might want to
tune this value for individual vif (whether that's a good idea or not is
another topic).

So, in order to convince myself this is safe. I also did some analysis
on the impact of having queue length other than default value. If
queue_len < XENVIF_QUEUE_LENGTH, that means you can queue less packets
in qdisc than default and drain it faster than calculated, which is
safe. On the other hand if queue_len > XENVIF_QUEUE_LENGTH, it means
actually you need more time than calculated. I'm in two minded here. The
default value seems sensible to me but I'm still a bit worried about the
queue_len > XENVIF_QUEUE_LENGTH case.

An idea is to book-keep maximum tx queue len among all vifs and use that
to calculate worst scenario.
I don't think it should be that perfect. This is just a best effort estimation, if someone changes the vif queue length and see this message because of that, nothing very drastic will happen. It is just a rate limited warning message. Well, it is marked as error, because it is a serious condition.
And also, the odds of seeing this message unnecessarily are quite low. With default settings (256 slots, max 17 per skb, 32 queue length, 10 secs queue drain timeout) this delay is 20 seconds. You can raise the queue length to 64 before getting warning (see netif_napi_add), so it would go up to 40 seconds, but anyway, if your vif is sitting on a packet more than 20 seconds, you deserve this message :)

Zoli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/