Re: [PATCH -mm 2/2] memcg: fix css reference leak and endless loopin mem_cgroup_iter

From: Hugh Dickins
Date: Thu Jan 23 2014 - 05:43:46 EST

On Wed, 22 Jan 2014, Michal Hocko wrote:
> On Tue 21-01-14 13:18:42, Hugh Dickins wrote:
> [...]
> > We do have a confusing situation. The hang goes back to 3.10 but takes
> > two different forms, because of intervening changes: in 3.10 and 3.11
> > mem_cgroup_iter repeatedly returns root memcg to its caller, in 3.12 and
> > 3.13 mem_cgroup_iter repeatedly gets NULL memcg from mem_cgroup_iter_next
> > and cannot return to its caller.
> >
> > Patch 1/2 is what's needed to fix 3.10 and 3.11 (and applies correctly
> > to 3.11, but will have to be rediffed for 3.10 because of rearrangement
> > in between).
> I will backport it when it reaches stable queue.

Thank you.

Testing, at home and at work, has confirmed your two patches are good.
Greg's particular test on 3.11 gave convincing results, and I was helped
by Linus's tree of last night (df32e43a54d0) happening to be quite quick
to reproduce the issue on my laptop.

> > Patch 2/2 is what's needed to fix 3.12 and 3.13 (but applies
> > correctly to neither of them because it's diffed on top of my CSS_ONLINE
> > fix). Patch 1/2 is correct but unnecessary in 3.12 and 3.13: I'm unclear
> > whether Michal is claiming that it would also fix the hang in 3.12 and
> > 3.13 if we didn't have 2/2: I doubt that, and haven't tested that.
> Actually both patches are needed. If we had only 2/2 then we wouldn't
> endless loop inside mem_cgroup_iter but we could still return root to
> caller all the time because mem_cgroup_iter_load would return NULL on
> css_tryget failure on the cached root. Or am I missing something that
> would prevent that?

In theory I agree with you; and if you're missing something, I can't see
it either. But in practice, all my earlier testing of 3.12 and 3.13 was
just with 2/2, and I've tried without your 1/2 since: whereas I have hung
on 3.12 and 3.13 a convincing number of times without 2/2, I have never
hung on them with 2/2 without 1/2. In practice 1/2 appears essential
for 3.10 and 3.11, but irrelevant for 3.12 and 3.13.

That could be easy to explain if there were a difference at the calling
end, shrink_zone(), between those releases: but I don't see that. Odd.
Either we're both missing something, or my testing is even less reliable
than I'd thought. But since I certainly don't dispute 1/2, it is merely
academic. Though still bothersome.

> > Given how Michal has diffed this patch on top of my CSS_ONLINE one
> > (mm-memcg-iteration-skip-memcgs-not-yet-fully-initialized.patch),
> > it would be helpful if you could mark that one also for stable 3.12+,
> > to save us from having to rediff this one for stable. We don't have
> > a concrete example of a problem it solves in the vanilla kernel, but
> > it makes more sense to include it than to exclude it.
> Yes, I think it makes sense to queue it for 3.12+ as well because it is
> non intrusive and potential issues would be really subtle.

Before Andrew sends these all off to Linus, I should admit that there's
probably a refinement still to come to the CSS_ONLINE one. I'm ashamed
to admit that I overlooked a much earlier comment from Greg Thelen, who
suggested that a memory barrier might be required. I think he's right,
and I'd have liked to say exactly what and where before answering you
now; but barriers are tricky elusive things, I've not yet decided,
and better report back to you now on the testing result.


> > (You would be right to point out that the CSS_ONLINE one fixes
> > something that goes back to 3.10: I'm saying 3.12+ because I'm not
> > motivated to rediff it for 3.10 and 3.11 when there's nothing to
> > go on top; but that's not a very good reason to lie - overrule me.)
> >
> > Hugh
> --
> Michal Hocko
> SUSE Labs
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at