Re: [PATCH] capabilities: add capability cgroup controller

From: Topi Miettinen
Date: Sun Jul 03 2016 - 11:08:57 EST


This is a multi-part message in MIME format.On 06/27/16 19:49, Serge E. Hallyn wrote:
> Quoting Tejun Heo (tj@xxxxxxxxxx):
>> Hello,
>>
>> On Mon, Jun 27, 2016 at 3:10 PM, Topi Miettinen <toiwoton@xxxxxxxxx> wrote:
>>> I'll have to study these more. But from what I saw so far, it looks to
>>> me that a separate tool would be needed to read taskstats and if that
>>> tool is not taken by distros, the users would not be any wiser, right?
>>> With cgroup (or /proc), no new tools would be needed.
>>
>> That is a factor but shouldn't be a deciding factor in designing our
>> user-facing interfaces. Please also note that kernel source tree
>> already has tools/ subdirectory which contains userland tools which
>> are distributed along with the kernel.
>
> And, if you take audit+cgroup approach then no tools are needed. So long
> as you can have audit print out the cgroups for a task as part of the
> capability audit record.
>

The attached patch would make any uses of capabilities generate audit
messages. It works for simple tests as you can see from the commit
message, but unfortunately the call to audit_cgroup_list() deadlocks the
system when booting a full blown OS. There's no deadlock when the call
is removed.

I guess that in some cases, cgroup_mutex and/or css_set_lock could be
already held earlier before entering audit_cgroup_list(). Holding the
locks is however required by task_cgroup_from_root(). Is there any way
to avoid this? For example, only print some kind of cgroup ID numbers
(are there unique and stable IDs, available without locks?) for those
cgroups where the task is registered in the audit message?

I could remove the cgroup part from the audit message entirely, but then
knowing which capabilities were used in what cgroup gets much more
difficult. The rest of the patch would be useful without it and of
course simpler.

In my earlier versions a per-task cap_used variable summarized all uses
of capabilities, but it was not clear when to reset the variable (fork?
exec? capset?), so it's gone for now. This was also used to rate limit
printing audit messages by only acting when each capability was first
used by the task, but now all uses of capabilities trigger audit
logging. Could that become a problem? I think it only makes sense to
summarize capability use per cgroup (via taskstats).

-Topi