Re: unregister_netdevice: waiting for DEV to become free (2)

From: Tetsuo Handa
Date: Wed May 01 2019 - 12:17:29 EST


On 2019/05/01 23:52, David Ahern wrote:
> On 5/1/19 7:38 AM, Tetsuo Handa wrote:
>> On 2019/04/30 3:43, David Ahern wrote:
>>>> The attached patch adds a tracepoint to notifier_call_chain. If you have
>>>> KALLSYMS enabled it will show the order of the function handlers:
>>>>
>>>> perf record -e notifier:* -a -g &
>>>>
>>>> ip netns del <NAME>
>>>> <wait a few seconds>
>>>>
>>>> fg
>>>> <ctrl-c on perf-record>
>>>>
>>>> perf script
>>>>
>>>
>>> with the header file this time.
>>>
>>
>> What is the intent of your patch? I can see that many notifiers are called. But
>> how does this help identify which event is responsible for dropping the refcount?
>>
>
> In a previous response you stated: "Since I'm not a netdev person, I
> appreciate if you can explain that shutdown sequence using a flow chart."

Yes, I said. But

>
> The notifier sequence tells you the order of cleanup handlers and what
> happens when a namespace is destroyed.
>
> The dev_hold / dev_put tracepoint helps find the refcnt leak but
> requires some time analyzing the output to match up hold / put stack traces.
>

I already observed that fib_netdev_event() calls rt_flush_dev() which becomes a no-op
after the refcount of a dev is moved to the loopback device in that namespace.
I think that there is no event which can drop the loopback device in that namespace.

[ 71.388104][ T7620] rt_flush_dev(00000000cd35e96a)->(00000000d9f4ea20)
[ 71.391757][ T7620] dev_hold(00000000d9f4ea20): 7->8
[ 71.394725][ T7620] CPU: 4 PID: 7620 Comm: a.out Not tainted 5.1.0-rc5+ #177
[ 71.398094][ T7620] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 71.403711][ T7620] Call Trace:
[ 71.405912][ T7620] dump_stack+0xaa/0xd8
[ 71.408252][ T7620] rt_flush_dev+0x177/0x1b0
[ 71.410802][ T7620] fib_netdev_event+0x150/0x1b0
[ 71.413270][ T7620] notifier_call_chain+0x47/0xd0
[ 71.415849][ T7620] raw_notifier_call_chain+0x2d/0x40
[ 71.418491][ T7620] ? tun_show_group+0x90/0x90
[ 71.421108][ T7620] call_netdevice_notifiers_info+0x32/0x70
[ 71.423854][ T7620] rollback_registered_many+0x421/0x680
[ 71.426583][ T7620] rollback_registered+0x68/0xb0
[ 71.429244][ T7620] unregister_netdevice_queue+0xa5/0x100
[ 71.432191][ T7620] __tun_detach+0x576/0x590
[ 71.435533][ T7620] tun_chr_close+0x41/0x80
[ 71.437957][ T7620] ? __tun_detach+0x590/0x590
[ 71.440500][ T7620] __fput+0xeb/0x2d0
[ 71.442816][ T7620] ____fput+0x15/0x20
[ 71.445090][ T7620] task_work_run+0xa9/0xd0
[ 71.447467][ T7620] do_exit+0x37a/0xf40
[ 71.449623][ T7620] do_group_exit+0x57/0xe0
[ 71.451826][ T7620] get_signal+0x114/0x950
[ 71.453989][ T7620] do_signal+0x2f/0x700
[ 71.456126][ T7620] ? handle_mm_fault+0x1a8/0x360
[ 71.458323][ T7620] ? __x64_sys_futex+0x179/0x210
[ 71.460620][ T7620] exit_to_usermode_loop+0x159/0x180
[ 71.462956][ T7620] do_syscall_64+0x15d/0x180
[ 71.465110][ T7620] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[ 89.873350][ T4478] rt_flush_dev(00000000d9f4ea20)->(00000000d9f4ea20)
[ 89.876311][ T4478] dev_hold(00000000d9f4ea20): 34->35
[ 89.878712][ T4478] CPU: 2 PID: 4478 Comm: kworker/u128:28 Not tainted 5.1.0-rc5+ #177
[ 89.881981][ T4478] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 89.887273][ T4478] Workqueue: netns cleanup_net
[ 89.889712][ T4478] Call Trace:
[ 89.891737][ T4478] dump_stack+0xaa/0xd8
[ 89.894127][ T4478] rt_flush_dev+0x177/0x1b0
[ 89.896477][ T4478] fib_netdev_event+0x150/0x1b0
[ 89.898810][ T4478] notifier_call_chain+0x47/0xd0
[ 89.901348][ T4478] raw_notifier_call_chain+0x2d/0x40
[ 89.903974][ T4478] call_netdevice_notifiers_info+0x32/0x70
[ 89.906450][ T4478] rollback_registered_many+0x421/0x680
[ 89.909125][ T4478] unregister_netdevice_many.part.119+0x17/0x90
[ 89.911833][ T4478] default_device_exit_batch+0x1a1/0x1d0
[ 89.914287][ T4478] ? do_wait_intr_irq+0xb0/0xb0
[ 89.916720][ T4478] ? unregister_netdevice_many+0x30/0x30
[ 89.919258][ T4478] ? dev_change_net_namespace+0x4e0/0x4e0
[ 89.921759][ T4478] ops_exit_list.isra.6+0x75/0x90
[ 89.924396][ T4478] cleanup_net+0x20d/0x380
[ 89.926632][ T4478] process_one_work+0x202/0x4f0
[ 89.929045][ T4478] worker_thread+0x3c/0x4b0
[ 89.931398][ T4478] kthread+0x139/0x160
[ 89.933448][ T4478] ? process_one_work+0x4f0/0x4f0
[ 89.935887][ T4478] ? kthread_destroy_worker+0x70/0x70
[ 89.938243][ T4478] ret_from_fork+0x35/0x40
[ 89.940530][ T4478] dev_put(00000000d9f4ea20): 35->34
[ 89.943031][ T4478] CPU: 2 PID: 4478 Comm: kworker/u128:28 Not tainted 5.1.0-rc5+ #177
[ 89.946064][ T4478] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
[ 89.951324][ T4478] Workqueue: netns cleanup_net
[ 89.953646][ T4478] Call Trace:
[ 89.955652][ T4478] dump_stack+0xaa/0xd8
[ 89.957857][ T4478] rt_flush_dev+0x19f/0x1b0
[ 89.960187][ T4478] fib_netdev_event+0x150/0x1b0
[ 89.962706][ T4478] notifier_call_chain+0x47/0xd0
[ 89.965044][ T4478] raw_notifier_call_chain+0x2d/0x40
[ 89.967503][ T4478] call_netdevice_notifiers_info+0x32/0x70
[ 89.970139][ T4478] rollback_registered_many+0x421/0x680
[ 89.972618][ T4478] unregister_netdevice_many.part.119+0x17/0x90
[ 89.975364][ T4478] default_device_exit_batch+0x1a1/0x1d0
[ 89.977827][ T4478] ? do_wait_intr_irq+0xb0/0xb0
[ 89.980066][ T4478] ? unregister_netdevice_many+0x30/0x30
[ 89.982761][ T4478] ? dev_change_net_namespace+0x4e0/0x4e0
[ 89.985231][ T4478] ops_exit_list.isra.6+0x75/0x90
[ 89.987756][ T4478] cleanup_net+0x20d/0x380
[ 89.990090][ T4478] process_one_work+0x202/0x4f0
[ 89.992384][ T4478] worker_thread+0x3c/0x4b0
[ 89.994702][ T4478] kthread+0x139/0x160
[ 89.996749][ T4478] ? process_one_work+0x4f0/0x4f0
[ 89.999116][ T4478] ? kthread_destroy_worker+0x70/0x70
[ 90.001580][ T4478] ret_from_fork+0x35/0x40