Re: [-mm][PATCH 3/4] Add rlimit controller accounting and control

From: Balbir Singh
Date: Thu May 08 2008 - 10:55:26 EST

Next message: Jeremy Fitzhardinge: "Re: [PATCH/RFC] stop_machine: make stop_machine_run more virtualizationfriendly"
Previous message: Dave Hansen: "Re: [PATCH] x86: fix PAE pmd_bad bootup warning"
In reply to: Balbir Singh: "Re: [-mm][PATCH 3/4] Add rlimit controller accounting and control"
Next in thread: Paul Menage: "Re: [-mm][PATCH 3/4] Add rlimit controller accounting and control"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Paul Menage wrote:
> On Sat, May 3, 2008 at 2:38 PM, Balbir Singh <balbir@xxxxxxxxxxxxxxxxxx> wrote:
>> +
>> +int rlimit_cgroup_charge_as(struct mm_struct *mm, unsigned long nr_pages)
>> +{
>> + int ret;
>> + struct rlimit_cgroup *rcg;
>> +
>> + rcu_read_lock();
>> + rcg = rlimit_cgroup_from_task(rcu_dereference(mm->owner));
>> + css_get(&rcg->css);
>> + rcu_read_unlock();
>> +
>> + ret = res_counter_charge(&rcg->as_res, (nr_pages << PAGE_SHIFT));
>> + css_put(&rcg->css);
>> + return ret;
>> +}
>
> You need to synchronize against mm->owner changing, or
> mm->owner->cgroups changing. How about:
>
> int rlimit_cgroup_charge_as(struct mm_struct *mm, unsigned long nr_pages)
> {
> int ret;
> struct rlimit_cgroup *rcg;
> struct task_struct *owner;
>
> rcu_read_lock();
> again:
>
> /* Find and lock the owner of the mm */
> owner = rcu_dereference(mm->owner);
> task_lock(owner);
> if (mm->owner != owner) {
> task_unlock(owner);
> goto again;
> }
>
> /* Charge the owner's cgroup with the new memory */
> rcg = rlimit_cgroup_from_task(owner);
> ret = res_counter_charge(&rcg->as_res, (nr_pages << PAGE_SHIFT));
> task_unlock(owner);
> rcu_read_unlock();
> return ret;
> }
>

My mind goes blank at times, so forgive me asking, what happens if we don't use
task_lock(). mm->owner cannot be freed, even if it changes, we get the callback
in mm_owner_changed(). The locations from where we call _charge and _uncharge,
we know that the mm is not going to change either.

>> +
>> +void rlimit_cgroup_uncharge_as(struct mm_struct *mm, unsigned long nr_pages)
>> +{
>> + struct rlimit_cgroup *rcg;
>> +
>> + rcu_read_lock();
>> + rcg = rlimit_cgroup_from_task(rcu_dereference(mm->owner));
>> + css_get(&rcg->css);
>> + rcu_read_unlock();
>> +
>> + res_counter_uncharge(&rcg->as_res, (nr_pages << PAGE_SHIFT));
>> + css_put(&rcg->css);
>> +}
>
> Can't this be implemented as just a call to charge() with a negative
> value? (Possibly fixing res_counter_charge() to handle negative values
> if necessary) Seems simpler.
>

I did that earlier, but then Pavel suggested splitting it up as charge/uncharge.
I found that his suggestion helped make the code more readable.

>> +/*
>> + * TODO: get the attach callbacks to fail and disallow task movement.
>> + */
>
> You mean disallow all movement within a hierarchy that has this cgroup
> mounted? Doesn't that make it rather hard to use?
>

Consider the following scenario

We try to move task "t1" from cgroup "A" to cgroup "B".
Doing so, causes "B" to go over it's limit, what do we do?
Ideally, we would like to be able to go back to cgroups and say, please fail
attach, since that causes "B" to go over it's specified limit.

>> +static void rlimit_cgroup_move_task(struct cgroup_subsys *ss,
>> + struct cgroup *cgrp,
>> + struct cgroup *old_cgrp,
>> + struct task_struct *p)
>> +{
>> + struct mm_struct *mm;
>> + struct rlimit_cgroup *rcg, *old_rcg;
>> +
>> + mm = get_task_mm(p);
>> + if (mm == NULL)
>> + return;
>> +
>> + rcu_read_lock();
>> + if (p != rcu_dereference(mm->owner))
>> + goto out;
>> +
>> + rcg = rlimit_cgroup_from_cgrp(cgrp);
>> + old_rcg = rlimit_cgroup_from_cgrp(old_cgrp);
>> +
>> + if (rcg == old_rcg)
>> + goto out;
>> +
>> + if (res_counter_charge(&rcg->as_res, (mm->total_vm << PAGE_SHIFT)))
>> + goto out;
>> + res_counter_uncharge(&old_rcg->as_res, (mm->total_vm << PAGE_SHIFT));
>> +out:
>> + rcu_read_unlock();
>> + mmput(mm);
>> +}
>> +
>
> Since you need to protect against concurrent charges, and against
> concurrent mm ownership changes, I think you should just do this under
> task_lock(p).

Please see my comment above on task_lock(). I'll draw some diagrams on the
whiteboard and check the sanity of my argument.

>
>> +static void rlimit_cgroup_mm_owner_changed(struct cgroup_subsys *ss,
>> + struct cgroup *cgrp,
>> + struct cgroup *old_cgrp,
>> + struct task_struct *p)
>> +{
>> + struct rlimit_cgroup *rcg, *old_rcg;
>> + struct mm_struct *mm = get_task_mm(p);
>> +
>> + BUG_ON(!mm);
>> + rcg = rlimit_cgroup_from_cgrp(cgrp);
>> + old_rcg = rlimit_cgroup_from_cgrp(old_cgrp);
>> + if (res_counter_charge(&rcg->as_res, (mm->total_vm << PAGE_SHIFT)))
>> + goto out;
>> + res_counter_uncharge(&old_rcg->as_res, (mm->total_vm << PAGE_SHIFT));
>> +out:
>> + mmput(mm);
>> +}
>> +
>
> Also needs to task_lock(p) to prevent concurrent charges or cgroup
> reassignments?
>

The callbacks are called with task_lock() held. Please see
mm_update_next_owner->cgroup_mm_owner_callbacks() in kernel/exit.c

--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jeremy Fitzhardinge: "Re: [PATCH/RFC] stop_machine: make stop_machine_run more virtualizationfriendly"
Previous message: Dave Hansen: "Re: [PATCH] x86: fix PAE pmd_bad bootup warning"
In reply to: Balbir Singh: "Re: [-mm][PATCH 3/4] Add rlimit controller accounting and control"
Next in thread: Paul Menage: "Re: [-mm][PATCH 3/4] Add rlimit controller accounting and control"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]