Re: [RFC] e1000e: Relax condition to trigger reset for ME workaround

From: Alexander Duyck
Date: Thu May 14 2020 - 10:57:46 EST


On Mon, May 11, 2020 at 9:45 PM Punit Agrawal
<punit1.agrawal@xxxxxxxxxxxxx> wrote:
>
> It's an error if the value of the RX/TX tail descriptor does not match
> what was written. The error condition is true regardless the duration
> of the interference from ME. But the code only performs the reset if
> E1000_ICH_FWSM_PCIM2PCI_COUNT (2000) iterations of 50us delay have
> transpired. The extra condition can lead to inconsistency between the
> state of hardware as expected by the driver.
>
> Fix this by dropping the check for number of delay iterations.
>
> Signed-off-by: Punit Agrawal <punit1.agrawal@xxxxxxxxxxxxx>
> Cc: Jeff Kirsher <jeffrey.t.kirsher@xxxxxxxxx>
> Cc: "David S. Miller" <davem@xxxxxxxxxxxxx>
> Cc: intel-wired-lan@xxxxxxxxxxxxxxxx
> Cc: netdev@xxxxxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> ---
> Hi,
>
> The issue was noticed through code inspection while backporting the
> workaround for TDT corruption. Sending it out as an RFC as I am not
> familiar with the hardware internals of the e1000e.
>
> Another unresolved question is the inherent racy nature of the
> workaround - the ME could block access again after releasing the
> device (i.e., BIT(E1000_ICH_FWSM_PCIM2PCI) clear) but before the
> driver performs the write. Has this not been a problem?
>
> Any feedback on the patch or the more information on the issues
> appreciated.
>
> Thanks,
> Punit
>
> drivers/net/ethernet/intel/e1000e/netdev.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 177c6da80c57..5ed4d7ed35b3 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -607,11 +607,11 @@ static void e1000e_update_rdt_wa(struct e1000_ring *rx_ring, unsigned int i)
> {
> struct e1000_adapter *adapter = rx_ring->adapter;
> struct e1000_hw *hw = &adapter->hw;
> - s32 ret_val = __ew32_prepare(hw);
>
> + __ew32_prepare(hw);
> writel(i, rx_ring->tail);
>
> - if (unlikely(!ret_val && (i != readl(rx_ring->tail)))) {
> + if (unlikely(i != readl(rx_ring->tail))) {
> u32 rctl = er32(RCTL);
>
> ew32(RCTL, rctl & ~E1000_RCTL_EN);
> @@ -624,11 +624,11 @@ static void e1000e_update_tdt_wa(struct e1000_ring *tx_ring, unsigned int i)
> {
> struct e1000_adapter *adapter = tx_ring->adapter;
> struct e1000_hw *hw = &adapter->hw;
> - s32 ret_val = __ew32_prepare(hw);
>
> + __ew32_prepare(hw);
> writel(i, tx_ring->tail);
>
> - if (unlikely(!ret_val && (i != readl(tx_ring->tail)))) {
> + if (unlikely(i != readl(tx_ring->tail))) {
> u32 tctl = er32(TCTL);
>
> ew32(TCTL, tctl & ~E1000_TCTL_EN);

You are eliminating the timeout check in favor of just verifying if
the write succeeded or not. Seems pretty straight forward to me.

One other change you may consider making would be to drop the return
value from __ew32_prepare since it doesn't appear to be used anywhere,
make the function static, and maybe get rid of the prototype in
e1000.h.

Reviewed-by: Alexander Duyck <alexander.h.duyck@xxxxxxxxxxxxxxx>