Re: [syzbot] [net?] INFO: task hung in register_nexthop_notifier (3)

From: Antoine Tenart
Date: Thu Mar 21 2024 - 05:22:46 EST


Quoting Eric Dumazet (2024-03-18 15:46:37)
> On Mon, Mar 18, 2024 at 12:26 PM syzbot
> <syzbot+99b8125966713aa4b0c3@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > INFO: task syz-executor.3:6975 blocked for more than 143 seconds.
> > Not tainted 6.8.0-rc7-syzkaller-02500-g76839e2f1fde #0
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > task:syz-executor.3 state:D stack:20920 pid:6975 tgid:6975 ppid:1 flags:0x00004006
> > Call Trace:
> > <TASK>
> > context_switch kernel/sched/core.c:5400 [inline]
> > __schedule+0x17d1/0x49f0 kernel/sched/core.c:6727
> > __schedule_loop kernel/sched/core.c:6802 [inline]
> > schedule+0x149/0x260 kernel/sched/core.c:6817
> > schedule_preempt_disabled+0x13/0x30 kernel/sched/core.c:6874
> > __mutex_lock_common kernel/locking/mutex.c:684 [inline]
> > __mutex_lock+0x6a3/0xd70 kernel/locking/mutex.c:752
> > register_nexthop_notifier+0x84/0x290 net/ipv4/nexthop.c:3863
> > nsim_fib_create+0x8a6/0xa70 drivers/net/netdevsim/fib.c:1587
> > nsim_drv_probe+0x747/0xb80 drivers/net/netdevsim/dev.c:1582
> > really_probe+0x29e/0xc50 drivers/base/dd.c:658
> > __driver_probe_device+0x1a2/0x3e0 drivers/base/dd.c:800
> > driver_probe_device+0x50/0x430 drivers/base/dd.c:830
> > __device_attach_driver+0x2d6/0x530 drivers/base/dd.c:958
> > bus_for_each_drv+0x24e/0x2e0 drivers/base/bus.c:457
> > __device_attach+0x333/0x520 drivers/base/dd.c:1030
> > bus_probe_device+0x189/0x260 drivers/base/bus.c:532
> > device_add+0x8ff/0xca0 drivers/base/core.c:3639
> > nsim_bus_dev_new drivers/net/netdevsim/bus.c:442 [inline]
> > new_device_store+0x3f2/0x890 drivers/net/netdevsim/bus.c:173
> > kernfs_fop_write_iter+0x3a4/0x500 fs/kernfs/file.c:334
>
> So we have a sysfs handler ultimately calling register_nexthop_notifier() or any
> other network control path requiring RTNL.
>
> Note that we have rtnl_trylock() for a reason...

Mentioning the below in case that gives some ideas; feel free to
disregard.

When I looked at similar issues a while ago the rtnl deadlock actually
happened with the kernfs_node refcount; haven't looked at this one in
details though. The mutex in there was just preventing concurrent
writers.

> Or maybe the reason is wrong, if we could change kernfs_fop_write_iter()
> to no longer hold a mutex...

At the time I found a way to safely drop the refcount of those
kernfs_node which then allowed to call rtnl_lock from sysfs handlers,
https://lore.kernel.org/all/20231018154804.420823-1-atenart@xxxxxxxxxx/T/

Note that this relied on how net device are unregistered (calling
device_del under rtnl and later waiting for refs on the netdev to drop
outside of the lock; and a few other things), so extra modifications
would be needed to generalize the approach. Also it's a tradeoff between
fixing those deadlocks without rtnl_trylock and maintaining a quite
complex logic...

Antoine