Re: [PATCH v3 2/7] socket: initial cgroup code.

From: Glauber Costa
Date: Wed Sep 21 2011 - 15:00:43 EST


On 09/21/2011 03:47 PM, Greg Thelen wrote:
On Sun, Sep 18, 2011 at 5:56 PM, Glauber Costa<glommer@xxxxxxxxxxxxx> wrote:
We aim to control the amount of kernel memory pinned at any
time by tcp sockets. To lay the foundations for this work,
this patch adds a pointer to the kmem_cgroup to the socket
structure.

Signed-off-by: Glauber Costa<glommer@xxxxxxxxxxxxx>
CC: David S. Miller<davem@xxxxxxxxxxxxx>
CC: Hiroyouki Kamezawa<kamezawa.hiroyu@xxxxxxxxxxxxxx>
CC: Eric W. Biederman<ebiederm@xxxxxxxxxxxx>
...
+void sock_update_memcg(struct sock *sk)
+{
+ /* right now a socket spends its whole life in the same cgroup */
+ BUG_ON(sk->sk_cgrp);
+
+ rcu_read_lock();
+ sk->sk_cgrp = mem_cgroup_from_task(current);
+
+ /*
+ * We don't need to protect against anything task-related, because
+ * we are basically stuck with the sock pointer that won't change,
+ * even if the task that originated the socket changes cgroups.
+ *
+ * What we do have to guarantee, is that the chain leading us to
+ * the top level won't change under our noses. Incrementing the
+ * reference count via cgroup_exclude_rmdir guarantees that.
+ */
+ cgroup_exclude_rmdir(mem_cgroup_css(sk->sk_cgrp));

This grabs a css_get() reference, which prevents rmdir (will return
-EBUSY).
Yes.

How long is this reference held?
For the socket lifetime.

I wonder about the case
where a process creates a socket in memcg M1 and later is moved into
memcg M2. At that point an admin would expect to be able to 'rmdir
M1'. I think this rmdir would return -EBUSY and I suspect it would be
difficult for the admin to understand why the rmdir of M1 failed. It
seems that to rmdir a memcg, an admin would have to kill all processes
that allocated sockets while in M1. Such processes may not still be
in M1.

+ rcu_read_unlock();
+}
I agree. But also, don't see too much ways around it without implementing full task migration.

Right now I am working under the assumption that tasks are long lived inside the cgroup. Migration potentially introduces some nasty locking problems in the mem_schedule path.

Also, unless I am missing something, the memcg already has the policy of
not carrying charges around, probably because of this very same complexity.

True that at least it won't EBUSY you... But I think this is at least a way to guarantee that the cgroup under our nose won't disappear in the middle of our allocations.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/