Re: [PATCH] capabilities: add capability cgroup controller

From: Topi Miettinen
Date: Thu Jul 07 2016 - 16:27:32 EST


On 07/07/16 09:16, Petr Mladek wrote:
> On Sun 2016-07-03 15:08:07, Topi Miettinen wrote:
>> The attached patch would make any uses of capabilities generate audit
>> messages. It works for simple tests as you can see from the commit
>> message, but unfortunately the call to audit_cgroup_list() deadlocks the
>> system when booting a full blown OS. There's no deadlock when the call
>> is removed.
>>
>> I guess that in some cases, cgroup_mutex and/or css_set_lock could be
>> already held earlier before entering audit_cgroup_list(). Holding the
>> locks is however required by task_cgroup_from_root(). Is there any way
>> to avoid this? For example, only print some kind of cgroup ID numbers
>> (are there unique and stable IDs, available without locks?) for those
>> cgroups where the task is registered in the audit message?
>
> I am not sure if anyone know what really happens here. I suggest to
> enable lockdep. It might detect possible deadlock even before it
> really happens, see Documentation/locking/lockdep-design.txt
>
> It can be enabled by
>
> CONFIG_PROVE_LOCKING=y
>
> It depends on
>
> CONFIG_DEBUG_KERNEL=y
>
> and maybe some more options, see lib/Kconfig.debug

Thanks a lot! I caught this stack dump:

starting version 230
[ 3.416647] ------------[ cut here ]------------
[ 3.417310] WARNING: CPU: 0 PID: 95 at
/home/topi/d/linux.git/kernel/locking/lockdep.c:2871
lockdep_trace_alloc+0xb4/0xc0
[ 3.417605] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
[ 3.417923] Modules linked in:
[ 3.418288] CPU: 0 PID: 95 Comm: systemd-udevd Not tainted 4.7.0-rc5+ #97
[ 3.418444] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS Debian-1.8.2-1 04/01/2014
[ 3.418726] 0000000000000086 000000007970f3b0 ffff88000016fb00
ffffffff813c9c45
[ 3.418993] ffff88000016fb50 0000000000000000 ffff88000016fb40
ffffffff81091e9b
[ 3.419176] 00000b3705e2c798 0000000000000046 0000000000000410
00000000ffffffff
[ 3.419374] Call Trace:
[ 3.419511] [<ffffffff813c9c45>] dump_stack+0x67/0x92
[ 3.419644] [<ffffffff81091e9b>] __warn+0xcb/0xf0
[ 3.419745] [<ffffffff81091f1f>] warn_slowpath_fmt+0x5f/0x80
[ 3.419868] [<ffffffff810e9a84>] lockdep_trace_alloc+0xb4/0xc0
[ 3.419988] [<ffffffff8120dc42>] kmem_cache_alloc_node+0x42/0x600
[ 3.420156] [<ffffffff8110432d>] ? debug_lockdep_rcu_enabled+0x1d/0x20
[ 3.420170] [<ffffffff8163183b>] __alloc_skb+0x5b/0x1d0
[ 3.420170] [<ffffffff81144f6b>] audit_log_start+0x29b/0x480
[ 3.420170] [<ffffffff810a2925>] ? __lock_task_sighand+0x95/0x270
[ 3.420170] [<ffffffff81145cc9>] audit_log_cap_use+0x39/0xf0
[ 3.420170] [<ffffffff8109cd75>] ns_capable+0x45/0x70
[ 3.420170] [<ffffffff8109cdb7>] capable+0x17/0x20
[ 3.420170] [<ffffffff812a2f50>] oom_score_adj_write+0x150/0x2f0
[ 3.420170] [<ffffffff81230997>] __vfs_write+0x37/0x160
[ 3.420170] [<ffffffff810e33b7>] ? update_fast_ctr+0x17/0x30
[ 3.420170] [<ffffffff810e3449>] ? percpu_down_read+0x49/0x90
[ 3.420170] [<ffffffff81233d47>] ? __sb_start_write+0xb7/0xf0
[ 3.420170] [<ffffffff81233d47>] ? __sb_start_write+0xb7/0xf0
[ 3.420170] [<ffffffff81231048>] vfs_write+0xb8/0x1b0
[ 3.420170] [<ffffffff812533c6>] ? __fget_light+0x66/0x90
[ 3.420170] [<ffffffff81232078>] SyS_write+0x58/0xc0
[ 3.420170] [<ffffffff81001f2c>] do_syscall_64+0x5c/0x300
[ 3.420170] [<ffffffff81849c9a>] entry_SYSCALL64_slow_path+0x25/0x25
[ 3.420170] ---[ end trace fb586899fb556a5e ]---
[ 3.447922] random: systemd-udevd urandom read with 3 bits of entropy
available
[ 4.014078] clocksource: Switched to clocksource tsc
Begin: Loading essential drivers ... done.

This is with qemu and the boot continues normally. With real computer,
there's no such output and system just seems to freeze.

Could it be possible that the deadlock happens because there's some IO
towards /sys/fs/cgroup, which causes a capability check and that in turn
causes locking problems when we try to print cgroup list?

-Topi