Re: [PATCH 07/10] memcg: add dirty limits to mem_cgroup

From: KAMEZAWA Hiroyuki
Date: Mon Oct 11 2010 - 21:01:20 EST

Next message: KAMEZAWA Hiroyuki: "Re: Problem: scaling of /proc/stat on large systems"
Previous message: Neeraj Kumar: "Re: Doing a zero-copy move of data from a kernel buffer to hard disk"
In reply to: Greg Thelen: "Re: [PATCH 07/10] memcg: add dirty limits to mem_cgroup"
Next in thread: Greg Thelen: "Re: [PATCH 07/10] memcg: add dirty limits to mem_cgroup"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, 11 Oct 2010 17:24:21 -0700
Greg Thelen <gthelen@xxxxxxxxxx> wrote:

> >> Is your motivation to increase performance with the same functionality?
> >> If so, then would a 'static inline' be performance equivalent to a
> >> preprocessor macro yet be safer to use?
> >>
> > Ah, if lockdep finds this as bug, I think other parts will hit this,
> > too. like this.
> >> static struct mem_cgroup *try_get_mem_cgroup_from_mm(struct mm_struct *mm)
> >> {
> >> struct mem_cgroup *mem = NULL;
> >>
> >> if (!mm)
> >> return NULL;
> >> /*
> >> * Because we have no locks, mm->owner's may be being moved to other
> >> * cgroup. We use css_tryget() here even if this looks
> >> * pessimistic (rather than adding locks here).
> >> */
> >> rcu_read_lock();
> >> do {
> >> mem = mem_cgroup_from_task(rcu_dereference(mm->owner));
> >> if (unlikely(!mem))
> >> break;
> >> } while (!css_tryget(&mem->css));
> >> rcu_read_unlock();
> >> return mem;
> >> }
>
> mem_cgroup_from_task() calls task_subsys_state() calls
> task_subsys_state_check(). task_subsys_state_check() will be happy if
> rcu_read_lock is held.
>
yes.

> I don't think that this will fail lockdep, because rcu_read_lock_held()
> is true when calling mem_cgroup_from_task() within
> try_get_mem_cgroup_from_mm()..
>
agreed.

> > mem_cgroup_from_task() is designed to be used as this.
> > If dqefined as macro, I think it will not be catched.
>
> I do not understand how making mem_cgroup_from_task() a macro will
> change its behavior wrt. to lockdep assertion checking. I assume that
> as a macro mem_cgroup_from_task() would still call task_subsys_state(),
> which requires either:
> a) rcu read lock held
> b) task->alloc_lock held
> c) cgroup lock held
>

Hmm. Maybe I was wrong.

>
> >> Maybe it makes more sense to find a way to perform this check in
> >> mem_cgroup_has_dirty_limit() without needing to grab the rcu lock. I
> >> think this lock grab is unneeded. I am still collecting performance
> >> data, but suspect that this may be making the code slower than it needs
> >> to be.
> >>
> >
> > Hmm. css_set[] itself is freed by RCU..what idea to remove rcu_read_lock() do
> > you have ? Adding some flags ?
>
> It seems like a shame to need a lock to determine if current is in the
> root cgroup. Especially given that as soon as
> mem_cgroup_has_dirty_limit() returns, the task could be moved
> in-to/out-of the root cgroup thereby invaliding the answer. So the
> answer is just a sample that may be wrong.

Yes. But it's not a bug but a specification.

> But I think you are correct.
> We will need the rcu read lock in mem_cgroup_has_dirty_limit().
>

yes.

> > Ah...I noticed that you should do
> >
> > mem = mem_cgroup_from_task(current->mm->owner);
> >
> > to check has_dirty_limit...
>
> What are the cases where current->mm->owner->cgroups !=
> current->cgroups?
>
In that case, assume group A and B.

thread(1) -> belongs to cgroup A (thread(1) is mm->owner)
thread(2) -> belongs to cgroup B
and
a page -> charnged to cgroup A

Then, thread(2) make the page dirty which is under cgroup A.

In this case, if page's dirty_pages accounting is added to cgroup B, cgroup B'
statistics may show "dirty_pages > all_lru_pages". This is bug.

> I was hoping to avoid having add even more logic into
> mem_cgroup_has_dirty_limit() to handle the case where current->mm is
> NULL.
>

Blease check current->mm. We can't limit works of kernel-thread by this, let's
consider it later if necessary.

> Presumably the newly proposed vm_dirty_param(),
> mem_cgroup_has_dirty_limit(), and mem_cgroup_page_stat() routines all
> need to use the same logic. I assume they should all be consistently
> using current->mm->owner or current.
>

please.

Thanks,
-Kame

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: KAMEZAWA Hiroyuki: "Re: Problem: scaling of /proc/stat on large systems"
Previous message: Neeraj Kumar: "Re: Doing a zero-copy move of data from a kernel buffer to hard disk"
In reply to: Greg Thelen: "Re: [PATCH 07/10] memcg: add dirty limits to mem_cgroup"
Next in thread: Greg Thelen: "Re: [PATCH 07/10] memcg: add dirty limits to mem_cgroup"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]