Re: [PATCH net 3/3] hinic: fix bug of send pkts while setting channels

From: luobin (L)
Date: Thu Sep 03 2020 - 10:49:53 EST


On 2020/9/2 18:16, Eric Dumazet wrote:
>
>
> On 9/2/20 2:41 AM, Luo bin wrote:
>> When calling hinic_close in hinic_set_channels, netif_carrier_off
>> and netif_tx_disable are excuted, and TX host resources are freed
>> after that. Core may call hinic_xmit_frame to send pkt after
>> netif_tx_disable within a short time, so we should judge whether
>> carrier is on before sending pkt otherwise the resources that
>> have already been freed in hinic_close may be accessed.
>>
>> Fixes: 2eed5a8b614b ("hinic: add set_channels ethtool_ops support")
>> Signed-off-by: Luo bin <luobin9@xxxxxxxxxx>
>> ---
>> drivers/net/ethernet/huawei/hinic/hinic_tx.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/huawei/hinic/hinic_tx.c b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
>> index a97498ee6914..a0662552a39c 100644
>> --- a/drivers/net/ethernet/huawei/hinic/hinic_tx.c
>> +++ b/drivers/net/ethernet/huawei/hinic/hinic_tx.c
>> @@ -531,6 +531,11 @@ netdev_tx_t hinic_xmit_frame(struct sk_buff *skb, struct net_device *netdev)
>> struct hinic_txq *txq;
>> struct hinic_qp *qp;
>>
>> + if (unlikely(!netif_carrier_ok(netdev))) {
>> + dev_kfree_skb_any(skb);
>> + return NETDEV_TX_OK;
>> + }
>> +
>> txq = &nic_dev->txqs[q_id];
>> qp = container_of(txq->sq, struct hinic_qp, sq);
>>
>>
>
> Adding this kind of tests in fast path seems a big hammer to me.
>
> See https://marc.info/?l=linux-netdev&m=159903844423389&w=2 for a similar problem.
>
> Normally, after hinic_close() operation, no packet should be sent by core networking stack.
>
> Trying to work around some core networking issue in each driver is a dead end.
Thanks for your review. I agree with what you said. Theoretically, core can't call ndo_start_xmit
to send packet after netif_tx_disable called by hinic_close because __QUEUE_STATE_DRV_XOFF bit is set
and this bit is protected by __netif_tx_lock but it does call hinic_xmit_frame after netif_tx_disable
in my debug message. I'll try to figure out why and fix it. It seems like that the patch from
https://marc.info/?l=linux-netdev&m=159903844423389&w=2 can't fix this problem.
>
>
>
>
>
>
> .
>