Re: [PATCH] net: nvidia: forcedeth: Fix two possible concurrency use-after-free bugs

From: Yanjun Zhu
Date: Tue Jan 08 2019 - 21:34:21 EST



On 2019/1/9 10:03, Jia-Ju Bai wrote:


On 2019/1/9 9:24, Yanjun Zhu wrote:

On 2019/1/8 20:57, Jia-Ju Bai wrote:


On 2019/1/8 20:54, Zhu Yanjun wrote:

å 2019/1/8 20:45, Jia-Ju Bai åé:
In drivers/net/ethernet/nvidia/forcedeth.c, the functions
nv_start_xmit() and nv_start_xmit_optimized() can be concurrently
executed with nv_poll_controller().

nv_start_xmit
ÂÂ line 2321: prev_tx_ctx->skb = skb;

nv_start_xmit_optimized
ÂÂ line 2479: prev_tx_ctx->skb = skb;

nv_poll_controller
ÂÂ nv_do_nic_poll
ÂÂÂÂ line 4134: spin_lock(&np->lock);
ÂÂÂÂ nv_drain_rxtx
ÂÂÂÂÂÂ nv_drain_tx
ÂÂÂÂÂÂÂÂ nv_release_txskb
ÂÂÂÂÂÂÂÂÂÂ line 2004: dev_kfree_skb_any(tx_skb->skb);

Thus, two possible concurrency use-after-free bugs may occur.

To fix these possible bugs,


Does this really occur? Can you reproduce this ?

This bug is not found by the real execution.
It is found by a static tool written by myself, and then I check it by manual code review.

Before "line 2004: dev_kfree_skb_any(tx_skb->skb); ",

"

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ nv_disable_irq(dev);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ nv_napi_disable(dev);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ netif_tx_lock_bh(dev);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ netif_addr_lock(dev);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ spin_lock(&np->lock);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ /* stop engines */
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ nv_stop_rxtx(dev);ÂÂ <---this stop rxtx
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ nv_txrx_reset(dev);
"

In this case, does nv_start_xmit or nv_start_xmit_optimized still work well?


nv_stop_rxtx() calls nv_stop_tx(dev).

static void nv_stop_tx(struct net_device *dev)
{
ÂÂÂ struct fe_priv *np = netdev_priv(dev);
ÂÂÂ u8 __iomem *base = get_hwbase(dev);
ÂÂÂ u32 tx_ctrl = readl(base + NvRegTransmitterControl);

ÂÂÂ if (!np->mac_in_use)
ÂÂÂÂÂÂÂ tx_ctrl &= ~NVREG_XMITCTL_START;
ÂÂÂ else
ÂÂÂÂÂÂÂ tx_ctrl |= NVREG_XMITCTL_TX_PATH_EN;
ÂÂÂ writel(tx_ctrl, base + NvRegTransmitterControl);
ÂÂÂ if (reg_delay(dev, NvRegTransmitterStatus, NVREG_XMITSTAT_BUSY, 0,
ÂÂÂÂÂÂÂÂÂÂÂÂÂ NV_TXSTOP_DELAY1, NV_TXSTOP_DELAY1MAX))
ÂÂÂÂÂÂÂ netdev_info(dev, "%s: TransmitterStatus remained busy\n",
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ __func__);

ÂÂÂ udelay(NV_TXSTOP_DELAY2);
ÂÂÂ if (!np->mac_in_use)
ÂÂÂÂÂÂÂ writel(readl(base + NvRegTransmitPoll) & NVREG_TRANSMITPOLL_MAC_ADDR_REV,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ base + NvRegTransmitPoll);
}

nv_stop_tx() seems to only write registers to stop transmitting for hardware.
But it does not wait until nv_start_xmit() and nv_start_xmit_optimized() finish execution.
There are 3 modes in forcedeth NIC.
In throughput mode (0), every tx & rx packet will generate an interrupt.
In CPU mode (1), interrupts are controlled by a timer.
In dynamic mode (2), the mode toggles between throughput and CPU mode based on network load.

From the source code,

"np->recover_error = 1;" is related with CPU mode.

nv_start_xmit or nv_start_xmit_optimized seems related with ghroughput mode.

In static void nv_do_nic_poll(struct timer_list *t),
when if (np->recover_error), line 2004: dev_kfree_skb_any(tx_skb->skb); will run.

When "np->recover_error=1", do you think nv_start_xmit or nv_start_xmit_optimized will be called?


Maybe netif_stop_queue() should be used here to stop transmitting for network layer, but this function does not seem to wait, either.
Do you know any function that can wait until ".ndo_start_xmit" finish execution?


Best wishes,
Jia-Ju Bai