Re: [PATCH net v2] ixgbe: fix double clean of tx descriptors with xdp

From: Alexander Duyck
Date: Thu Aug 22 2019 - 13:10:17 EST


On Thu, Aug 22, 2019 at 9:58 AM Ilya Maximets <i.maximets@xxxxxxxxxxx> wrote:
>
> On 22.08.2019 19:38, Alexander Duyck wrote:
> > On Thu, Aug 22, 2019 at 5:30 AM Ilya Maximets <i.maximets@xxxxxxxxxxx> wrote:
> >>
> >> Tx code doesn't clear the descriptors' status after cleaning.
> >> So, if the budget is larger than number of used elems in a ring, some
> >> descriptors will be accounted twice and xsk_umem_complete_tx will move
> >> prod_tail far beyond the prod_head breaking the comletion queue ring.
> >>
> >> Fix that by limiting the number of descriptors to clean by the number
> >> of used descriptors in the tx ring.
> >>
> >> 'ixgbe_clean_xdp_tx_irq()' function refactored to look more like
> >> 'ixgbe_xsk_clean_tx_ring()' since we don't need most of the
> >> complications implemented in the regular 'ixgbe_clean_tx_irq()'
> >> and we're allowed to directly use 'next_to_clean' and 'next_to_use'
> >> indexes.
> >>
> >> Fixes: 8221c5eba8c1 ("ixgbe: add AF_XDP zero-copy Tx support")
> >> Signed-off-by: Ilya Maximets <i.maximets@xxxxxxxxxxx>
> >> ---
> >>
> >> Version 2:
> >> * 'ixgbe_clean_xdp_tx_irq()' refactored to look more like
> >> 'ixgbe_xsk_clean_tx_ring()'.
> >>
> >> drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 34 ++++++++------------
> >> 1 file changed, 13 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> >> index 6b609553329f..d1297660e14a 100644
> >> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> >> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> >> @@ -633,22 +633,23 @@ static void ixgbe_clean_xdp_tx_buffer(struct ixgbe_ring *tx_ring,
> >> bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
> >> struct ixgbe_ring *tx_ring, int napi_budget)
> >> {
> >> + u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
> >> unsigned int total_packets = 0, total_bytes = 0;
> >> - u32 i = tx_ring->next_to_clean, xsk_frames = 0;
> >> unsigned int budget = q_vector->tx.work_limit;
> >> struct xdp_umem *umem = tx_ring->xsk_umem;
> >> - union ixgbe_adv_tx_desc *tx_desc;
> >> - struct ixgbe_tx_buffer *tx_bi;
> >> + u32 xsk_frames = 0;
> >> bool xmit_done;
> >>
> >> - tx_bi = &tx_ring->tx_buffer_info[i];
> >> - tx_desc = IXGBE_TX_DESC(tx_ring, i);
> >> - i -= tx_ring->count;
> >> + while (likely(ntc != ntu && budget)) {
> >
> > I would say you can get rid of budget entirely. It was only really
> > needed for the regular Tx case where you can have multiple CPUs
> > feeding a single Tx queue and causing a stall. Since we have a 1:1
> > mapping we should never have more than the Rx budget worth of packets
> > to really process. In addition we can only make one pass through the
> > ring since the ntu value is not updated while running the loop.
>
> OK. Will remove.
>
> >
> >> + union ixgbe_adv_tx_desc *tx_desc;
> >> + struct ixgbe_tx_buffer *tx_bi;
> >> +
> >> + tx_desc = IXGBE_TX_DESC(tx_ring, ntc);
> >>
> >> - do {
> >> if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
> >> break;
> >>
> >> + tx_bi = &tx_ring->tx_buffer_info[ntc];
> >
> > Please don't move this logic into the loop. We were intentionally
> > processing this outside of the loop once and then just doing the
> > increments because it is faster that way. It takes several operations
> > to compute tx_bi based on ntc, whereas just incrementing is a single
> > operation.
>
> OK.
>
> >
> >> total_bytes += tx_bi->bytecount;
> >> total_packets += tx_bi->gso_segs;
> >>
> >> @@ -659,24 +660,15 @@ bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
> >>
> >> tx_bi->xdpf = NULL;
> >>
> >> - tx_bi++;
> >> - tx_desc++;
> >> - i++;
> >> - if (unlikely(!i)) {
> >> - i -= tx_ring->count;
> >
> > So these two lines can probably just be replaced by:
> > if (unlikely(ntc == tx_ring->count)) {
> > ntc = 0;
>
> Sure.
>
> >
> >> - tx_bi = tx_ring->tx_buffer_info;
> >> - tx_desc = IXGBE_TX_DESC(tx_ring, 0);
> >> - }
> >> -
> >> - /* issue prefetch for next Tx descriptor */
> >> - prefetch(tx_desc);
> >
> > Did you just drop the prefetch?
>
> I'll keep the prefetch in v3 because, as you fairly mentioned, it's not
> related to this patch. However, I'm not sure if this prefetch makes any
> sense here, because there is only one comparison operation between the
> prefetch and the data usage:
>
> while (ntc != ntu) {
> if (!(tx_desc->wb.status ...
> <...>
> prefetch(tx_desc);
> }

I'm not opposed to dropping the prefetch, but if you are going to do
it you should do it in a separate patch.