Re: [PATCH net-next v3 3/4] net: lan966x: Add FDMA functionality

From: Horatiu Vultur
Date: Thu Apr 07 2022 - 03:14:56 EST


The 04/06/2022 10:37, Jakub Kicinski wrote:
>
> On Wed, 6 Apr 2022 13:21:15 +0200 Horatiu Vultur wrote:
> > > > +static int lan966x_fdma_tx_alloc(struct lan966x_tx *tx)
> > > > +{
> > > > + struct lan966x *lan966x = tx->lan966x;
> > > > + struct lan966x_tx_dcb *dcb;
> > > > + struct lan966x_db *db;
> > > > + int size;
> > > > + int i, j;
> > > > +
> > > > + tx->dcbs_buf = kcalloc(FDMA_DCB_MAX, sizeof(struct lan966x_tx_dcb_buf),
> > > > + GFP_ATOMIC);
> > > > + if (!tx->dcbs_buf)
> > > > + return -ENOMEM;
> > > > +
> > > > + /* calculate how many pages are needed to allocate the dcbs */
> > > > + size = sizeof(struct lan966x_tx_dcb) * FDMA_DCB_MAX;
> > > > + size = ALIGN(size, PAGE_SIZE);
> > > > + tx->dcbs = dma_alloc_coherent(lan966x->dev, size, &tx->dma, GFP_ATOMIC);
> > >
> > > This functions seems to only be called from probe, so GFP_KERNEL
> > > is better.
> >
> > But in the next patch of this series will be called while holding
> > the lan966x->tx_lock. Should I still change it to GFP_KERNEL and then
> > in the next one will change to GFP_ATOMIC?
>
> Ah, I missed that. You can keep the GFP_ATOMIC then.
>
> But I think the reconfig path may be racy. You disable Rx, but don't
> disable napi. NAPI may still be running and doing Rx while you're
> trying to free the rx skbs, no?

Yes, it is possible to have race conditions there. Even though I disable
the HW and make sure the RX FDMA is disabled. It could be that a frame
is received and then we get an interrupt and we just call napi_schedule.
At this point we change the MTU, and once we disable the HW and the RX
FDMA, then the napi_poll is called.
So I will make sure call napi_synchronize and napi_disable.

>
> Once napi is disabled you can disable Tx and then you have full
> ownership of the Tx side, no need to hold the lock during
> lan966x_fdma_tx_alloc(), I'd think.

I can do that. The only thing is that I need to disable the Tx for all
the ports. Because the FDMA is shared by all the ports.

>
> > > > +int lan966x_fdma_xmit(struct sk_buff *skb, __be32 *ifh, struct net_device *dev)
> > > > +{
> > > > + struct lan966x_port *port = netdev_priv(dev);
> > > > + struct lan966x *lan966x = port->lan966x;
> > > > + struct lan966x_tx_dcb_buf *next_dcb_buf;
> > > > + struct lan966x_tx_dcb *next_dcb, *dcb;
> > > > + struct lan966x_tx *tx = &lan966x->tx;
> > > > + struct lan966x_db *next_db;
> > > > + int needed_headroom;
> > > > + int needed_tailroom;
> > > > + dma_addr_t dma_addr;
> > > > + int next_to_use;
> > > > + int err;
> > > > +
> > > > + /* Get next index */
> > > > + next_to_use = lan966x_fdma_get_next_dcb(tx);
> > > > + if (next_to_use < 0) {
> > > > + netif_stop_queue(dev);
> > > > + return NETDEV_TX_BUSY;
> > > > + }
> > > > +
> > > > + if (skb_put_padto(skb, ETH_ZLEN)) {
> > > > + dev->stats.tx_dropped++;
> > > > + return NETDEV_TX_OK;
> > > > + }
> > > > +
> > > > + /* skb processing */
> > > > + needed_headroom = max_t(int, IFH_LEN * sizeof(u32) - skb_headroom(skb), 0);
> > > > + needed_tailroom = max_t(int, ETH_FCS_LEN - skb_tailroom(skb), 0);
> > > > + if (needed_headroom || needed_tailroom || skb_header_cloned(skb)) {
> > > > + err = pskb_expand_head(skb, needed_headroom, needed_tailroom,
> > > > + GFP_ATOMIC);
> > > > + if (unlikely(err)) {
> > > > + dev->stats.tx_dropped++;
> > > > + err = NETDEV_TX_OK;
> > > > + goto release;
> > > > + }
> > > > + }
> > > > +
> > > > + skb_tx_timestamp(skb);
> > >
> > > This could move down after the dma mapping, so it's closer to when
> > > the devices gets ownership.
> >
> > The problem is that, if I move this lower, then the SKB is changed
> > because the IFH is added to the frame. So now if we do timestamping in
> > the PHY then when we call classify inside 'skb_clone_tx_timestamp'
> > will always return PTP_CLASS_NONE so the PHY will never get the frame.
> > That is the reason why I have move it back.
>
> Oh, I see, makes sense!

--
/Horatiu