Re: [PATCH v3 4/7] memcg: remove memcg from the reclaim iterators

From: Johannes Weiner
Date: Tue Feb 12 2013 - 11:34:07 EST




Michal Hocko <mhocko@xxxxxxx> wrote:

>On Tue 12-02-13 10:10:02, Johannes Weiner wrote:
>> On Tue, Feb 12, 2013 at 10:54:19AM +0100, Michal Hocko wrote:
>> > On Mon 11-02-13 17:39:43, Johannes Weiner wrote:
>> > > On Mon, Feb 11, 2013 at 10:27:56PM +0100, Michal Hocko wrote:
>> > > > On Mon 11-02-13 14:58:24, Johannes Weiner wrote:
>> > > > > That way, if the dead count gives the go-ahead, you KNOW that
>the
>> > > > > position cache is valid, because it has been updated first.
>> > > >
>> > > > OK, you are right. We can live without css_tryget because
>dead_count is
>> > > > either OK which means that css would be alive at least this rcu
>period
>> > > > (and RCU walk would be safe as well) or it is incremented which
>means
>> > > > that we have started css_offline already and then css is dead
>already.
>> > > > So css_tryget can be dropped.
>> > >
>> > > Not quite :)
>> > >
>> > > The dead_count check is for completed destructions,
>> >
>> > Not quite :P. dead_count is incremented in css_offline callback
>which is
>> > called before the cgroup core releases its last reference and
>unlinks
>> > the group from the siblinks. css_tryget would already fail at this
>stage
>> > because CSS_DEACT_BIAS is in place at that time but this doesn't
>break
>> > RCU walk. So I think we are safe even without css_get.
>>
>> But you drop the RCU lock before you return.
>>
>> dead_count IS incremented for every destruction, but it's not
>reliable
>> for concurrent ones, is what I meant. Again, if there is a
>dead_count
>> mismatch, your pointer might be dangling, easy case. However, even
>if
>> there is no mismatch, you could still race with a destruction that
>has
>> marked the object dead, and then frees it once you drop the RCU lock,
>> so you need try_get() to check if the object is dead, or you could
>> return a pointer to freed or soon to be freed memory.
>
>Wait a moment. But what prevents from the following race?
>
>rcu_read_lock()
> mem_cgroup_css_offline(memcg)
> root->dead_count++
>iter->last_dead_count = root->dead_count

use the dead count read the first time for comparison, i.e. only one atomic read in that function. you are right, we would miss to account for that concurrent destruction otherwise.

>iter->last_visited = memcg
> // final
> css_put(memcg);
>// last_visited is still valid
>rcu_read_unlock()
>[...]
>// next iteration
>rcu_read_lock()
>iter->last_dead_count == root->dead_count
>// KABOOM
>
>The race window between dead_count++ and css_put is quite big but that
>is not important because that css_put can happen anytime before we
>start
>the next iteration and take rcu_read_lock.

--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/