Re: Bisected: Kernel deadlock bug related to cgroups.

From: Brent Lovelace
Date: Wed Aug 31 2016 - 19:20:04 EST




On 08/31/2016 03:33 AM, Balbir Singh wrote:

On 31/08/16 20:16, Oleg Nesterov wrote:
On 08/30, Brent Lovelace wrote:
I found a kernel deadlock regression bug introduced in the 4.4 kernel.
...
I bisected to this commit id:
----------------------------------------------------------------------------------
commit c9e75f0492b248aeaa7af8991a6fc9a21506bc96
Author: Oleg Nesterov <oleg@xxxxxxxxxx>
Date: Fri Nov 27 19:57:19 2015 +0100

cgroup: pids: fix race between cgroup_post_fork() and cgroup_migrate()
Thanks Brent!

systemd D ffff88007dfcfd10 0 1 0 0x00000000
ffff88007dfcfd10 00ff88007dfcfcd8 0000000000000001 ffff88007e296f80
ffff88007dff8000 ffff88007dfd0000 ffffffff82bae220 ffff880036758c00
fffffffffffffff2 ffff88005e327b00 ffff88007dfcfd28 ffffffff816c571f
Call Trace:
[<ffffffff816c571f>] schedule+0x7a/0x8f
[<ffffffff81126ed2>] percpu_down_write+0xad/0xc4
[<ffffffff811220c8>] ? wake_up_atomic_t+0x25/0x25
[<ffffffff81169b4d>] __cgroup_procs_write+0x72/0x229
[<ffffffff8112b2eb>] ? lock_acquire+0x103/0x18f
so it sleeps in wait_event() waiting for active readers, and the new
readers will block. In particular, do_exit() will block.

New readers/writers both, usually this is a sign that a reader is
already holding the percpu rwsem.

kworker/u8:2 D ffff880036befb58 0 185 2 0x00000000
Workqueue: netns cleanup_net
ffff880036befb58 00ff880036befbd0 ffffffff00000002 ffff88007e316f80
ffff8800783e8000 ffff880036bf0000 ffff88005917bed0 ffff8800783e8000
ffffffff816c8953 ffff88005917bed8 ffff880036befb70 ffffffff816c571f
Call Trace:
[<ffffffff816c8953>] ? usleep_range+0x3a/0x3a
[<ffffffff816c571f>] schedule+0x7a/0x8f
[<ffffffff816c8982>] schedule_timeout+0x2f/0xd8
[<ffffffff816c9204>] ? _raw_spin_unlock_irq+0x27/0x3f
[<ffffffff816c8953>] ? usleep_range+0x3a/0x3a
[<ffffffff81129ea2>] ? trace_hardirqs_on_caller+0x16f/0x18b
[<ffffffff816c5f44>] do_wait_for_common+0xf0/0x127
[<ffffffff816c5f44>] ? do_wait_for_common+0xf0/0x127
[<ffffffff811101c3>] ? wake_up_q+0x42/0x42
[<ffffffff816c5ff2>] wait_for_common+0x36/0x50
[<ffffffff816c6024>] wait_for_completion+0x18/0x1a
[<ffffffff81106b5f>] kthread_stop+0xc8/0x217
[<ffffffffa06d440c>] pg_net_exit+0xbc/0x112 [pktgen]
[<ffffffff81604fa2>] ops_exit_list+0x3d/0x4e
[<ffffffff81606123>] cleanup_net+0x19f/0x234
[<ffffffff81101aa3>] process_one_work+0x237/0x46b
[<ffffffff8110216d>] worker_thread+0x1e7/0x292
[<ffffffff81101f86>] ? rescuer_thread+0x285/0x285
[<ffffffff81106ded>] kthread+0xc4/0xcc
[<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
[<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
[<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
3 locks held by kworker/u8:2/185:
#0: ("%s""netns"){.+.+.+}, at: [<ffffffff811019ad>]
process_one_work+0x141/0x46b
#1: (net_cleanup_work){+.+.+.}, at: [<ffffffff811019ad>]
process_one_work+0x141/0x46b
#2: (net_mutex){+.+.+.}, at: [<ffffffff81605ffe>] cleanup_net+0x7a/0x234
Note that it sleeps with net_mutex held. Probably waiting for kpktgend_*
below.

vsftpd D ffff880054867c68 0 4352 2611 0x00000000
ffff880054867c68 00ff88005933a480 ffff880000000000 ffff88007e216f80
ffff88005933a480 ffff880054868000 0000000000000246 ffff880054867cc0
ffff88005933a480 ffffffff81cea268 ffff880054867c80 ffffffff816c571f
Call Trace:
[<ffffffff816c571f>] schedule+0x7a/0x8f
[<ffffffff816c59b1>] schedule_preempt_disabled+0x10/0x19
[<ffffffff816c64c0>] mutex_lock_nested+0x1c0/0x3a0
[<ffffffff81606233>] ? copy_net_ns+0x7b/0xf8
[<ffffffff81606233>] copy_net_ns+0x7b/0xf8
[<ffffffff81606233>] ? copy_net_ns+0x7b/0xf8
[<ffffffff811073c9>] create_new_namespaces+0xfc/0x16b
[<ffffffff8110759c>] copy_namespaces+0x164/0x186
[<ffffffff810ea6b1>] copy_process+0x10d2/0x195d
[<ffffffff810eb094>] _do_fork+0x8c/0x2fb
[<ffffffff81003044>] ? lockdep_sys_exit_thunk+0x12/0x14
[<ffffffff810eb375>] SyS_clone+0x14/0x16
[<ffffffff816c99b6>] entry_SYSCALL_64_fastpath+0x16/0x76
2 locks held by vsftpd/4352:
#0: (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810e9b97>]
copy_process+0x5b8/0x195d
#1: (net_mutex){+.+.+.}, at: [<ffffffff81606233>] copy_net_ns+0x7b/0xf8
This waits for net_mutex held by kworker/u8:2 above. And with
cgroup_threadgroup_rwsem acquired for reading, that is why systemd
above hangs.

kpktgend_0 D ffff88005917bce8 0 4354 2 0x00000000
ffff88005917bce8 00ffffffa06d5d06 ffff880000000000 ffff88007e216f80
ffff88007a4ec900 ffff88005917c000 ffff88007a4ec900 ffffffffa06d5d06
ffff88005917bed0 0000000000000000 ffff88005917bd00 ffffffff816c571f
Call Trace:
[<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
[<ffffffff816c571f>] schedule+0x7a/0x8f
[<ffffffff816c85ee>] rwsem_down_read_failed+0xdc/0xf8
[<ffffffff8136d634>] call_rwsem_down_read_failed+0x14/0x30
[<ffffffff8136d634>] ? call_rwsem_down_read_failed+0x14/0x30
[<ffffffff810f91e0>] ? exit_signals+0x17/0x103
[<ffffffff81126dc4>] ? percpu_down_read+0x4d/0x5f
[<ffffffff810f91e0>] exit_signals+0x17/0x103
[<ffffffff810ed342>] do_exit+0x105/0x9a4
[<ffffffffa06d5d06>] ? pg_net_init+0x346/0x346 [pktgen]
[<ffffffff81106df5>] kthread+0xcc/0xcc
[<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
[<ffffffff816c9d6f>] ret_from_fork+0x3f/0x70
[<ffffffff81106d29>] ? kthread_parkme+0x1f/0x1f
1 lock held by kpktgend_0/4354:
#0: (&cgroup_threadgroup_rwsem){++++++}, at: [<ffffffff810f91e0>]
it can't take cgroup_threadgroup_rwsem for reading, so it can't exit,
and that is why kworker/u8:2 hangs.

kpktgend_1 D ffff88007a4e3ce8 0 4355 2 0x00000000
...
kpktgend_2 D ffff8800549f7ce8 0 4356 2 0x00000000
...
kpktgend_3 D ffff88005e2b7ce8 0 4357 2 0x00000000
...

The same.

Could you try the recent 568ac888215c7fb2fab "cgroup: reduce read
locked section of cgroup_threadgroup_rwsem during fork" patch?
Attached below.


With this patch copy_net_ns() should be called outside of
cgroup_threadgroup_rwsem, the deadlock should hopefully go away.

Thanks,
Yes, I would be interested in seeing if this race goes away. Thanks for
the pointer Oleg!

Balbir Singh.

Thank you so much for the help! With 568ac888215c7fb2fab patched into the 4.4 kernel, the deadlock issue went away.