Re: [PATCH 7/8] memcg: get rid of mm_struct::owner

From: Johannes Weiner
Date: Wed Jul 29 2015 - 12:43:29 EST


On Wed, Jul 29, 2015 at 05:05:49PM +0200, Michal Hocko wrote:
> On Wed 29-07-15 09:14:54, Johannes Weiner wrote:
> > On Tue, Jul 14, 2015 at 05:18:23PM +0200, Michal Hocko wrote:
> [...]
> > > 3) fail mem_cgroup_can_attach if we are trying to migrate a task sharing
> > > mm_struct with a process outside of the tset. If I understand the
> > > tset properly this would require all the sharing tasks to be migrated
> > > together and we would never end up with task_css != &task->mm->css.
> > > __cgroup_procs_write doesn't seem to support multi pid move currently
> > > AFAICS, though. cgroup_migrate_add_src, however, seems to be intended
> > > for this purpose so this should be doable. Without that support we would
> > > basically disallow migrating these tasks - I wouldn't object if you ask
> > > me.
> >
> > I'd prefer not adding controller-specific failure modes for attaching,
>
> Does this mean that there is a plan to drop the return value from
> can_attach? I can see that both cpuset and cpu controllers currently
> allow to fail to attach. Are those going to change? I remember some
> discussions but no clear outcome of those.

Nothing but the realtime stuff needs to be able to fail migration due
to controller restraints. This should probably remain a fringe thing,
because it does make for a much more ambiguous interface.

So I think can_attach() will have to stay, but it should be avoided.

> > and this too would lead to very non-obvious behavior.
>
> Yeah, the user will not get an error source with the current API but
> this is an inherent restriction currently. Maybe we can add a knob with
> the error source?
>
> If there is a clear consensus that can_attach failures are clearly a no
> go then what about "silent" moving of the associated tasks? This would
> be similar to thread group except the group would be more generic term.
>
> > > Do you see other options? From the above three options the 3rd one
> > > sounds the most sane to me and the 1st quite easy to implement. Both will
> > > require some cgroup core work though. But maybe we would be good enough
> > > with 3rd option without supporting moving schizophrenic tasks and that
> > > would be reduced to memcg code.
> >
> > A modified form of 1) would be to track the mms referring to a memcg
> > but during offline search the process tree for a matching task.
>
> But we might have many of those and all of them living in different
> cgroups. So which one do we take? The first encountered, the one with
> the majority? I am not sure this is much better.
>
> I would really prefer if we could get rid of the schizophrenia if it is
> possible.

The first encountered.

This is just our model for sharing memory across groups. Page cache,
writeback, address space--we have always accounted based on who's
touching it first. We might as well stick with it for shared mms.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/