Re: [PATCH] net: mscc: ocelot: replace readx_poll_timeout with readx_poll_timeout_atomic

From: Vladimir Oltean
Date: Sat May 16 2020 - 18:52:16 EST


On Sat, 16 May 2020 at 23:54, David Miller <davem@xxxxxxxxxxxxx> wrote:
>
> From: Xulin Sun <xulin.sun@xxxxxxxxxxxxx>
> Date: Fri, 15 May 2020 11:18:13 +0800
>
> > BUG: sleeping function called from invalid context at drivers/net/ethernet/mscc/ocelot.c:59
> > in_atomic(): 1, irqs_disabled(): 0, pid: 3778, name: ifconfig
> > INFO: lockdep is turned off.
> > Preemption disabled at:
> > [<ffff2b163c83b78c>] dev_set_rx_mode+0x24/0x40
> > Hardware name: LS1028A RDB Board (DT)
> > Call trace:
> > dump_backtrace+0x0/0x140
> > show_stack+0x24/0x30
> > dump_stack+0xc4/0x10c
> > ___might_sleep+0x194/0x230
> > __might_sleep+0x58/0x90
> > ocelot_mact_forget+0x74/0xf8
> > ocelot_mc_unsync+0x2c/0x38
> > __hw_addr_sync_dev+0x6c/0x130
> > ocelot_set_rx_mode+0x8c/0xa0
>
> Vladimir states that this call chain is not possible in mainline.
>
> I'm not applying this.

(but the essence of the problem is legitimate though)

There are 2 specific things I don't like:
- The problem is claimed to reproduce on "LS1028A RDB Board (DT)"
which does not call ocelot_set_rx_mode. So it claims to fix a problem
for which only Xulin has the ability to decide whether it is the right
solution or not.
- On ocelot, it _looks_ like it is indeed a problem which was
introduced in 639c1b2625af ("net: mscc: ocelot: Register poll timeout
should be wall time not attempts"). But there was no attempt to bring
it up with the author of that patch, who very clearly expressed that
he is working on hardware where the polling timeout is in the order of
milliseconds, and the timeout for the driver is currently set at 100
ms. I'm not very sure that it is desirable to spin in atomic context
for 100 ms.

Thanks,
-Vladimir