Re: [RFC] cpuset: Enable changing of top_cpuset's mems_allowed nodemask

From: Mel Gorman
Date: Wed Feb 01 2017 - 04:19:04 EST


On Wed, Feb 01, 2017 at 01:01:24PM +0530, Anshuman Khandual wrote:
> On 01/31/2017 09:30 PM, Mel Gorman wrote:
> > On Tue, Jan 31, 2017 at 07:52:37PM +0530, Anshuman Khandual wrote:
> >> At present, top_cpuset.mems_allowed is same as node_states[N_MEMORY] and it
> >> cannot be changed at the runtime. Maximum possible node_states[N_MEMORY]
> >> also gets reflected in top_cpuset.effective_mems interface. It prevents some
> >> one from removing or restricting memory placement which will be applicable
> >> system wide on a given memory node through cpuset mechanism which might be
> >> limiting. This solves the problem by enabling update_nodemask() function to
> >> accept changes to top_cpuset.mems_allowed as well. Once changed, it also
> >> updates the value of top_cpuset.effective_mems. Updates all it's task's
> >> mems_allowed nodemask as well. It calls cpuset_inc() to make sure cpuset
> >> is accounted for in the buddy allocator through cpusets_enabled() check.
> >>
> >
> > What's the point of allowing the root cpuset to be restricted?
>
> After an extended period of run time on a system, currently if we have
> to run HW diagnostics and dump (which are run out of band) for debug
> purpose, we have to stop further allocations to the node. Hot plugging
> the memory node out of the kernel will achieve this. But it can also
> be made possible by just enabling top_cpuset.memory_migrate and then
> restricting all the allocations by removing the node from top_cpuset.
> mems_allowed nodemask. This will force all the existing allocations
> out of the target node.
>

So would creating a restricted cpuset and migrating all tasks from the
root cpuset into it.

> More importantly it also extends the cpuset memory restriction feature
> to the logical completion without adding any regressions for the
> existing use cases. Then why not do this ? Does it add any overhead ?
>

It violates the expectation that the root cgroup can access all
resources. Once enabled, there is some overhead in the page allocator as
it must check all cpusets even for tasks that weren't configured to be
isolated.

--
Mel Gorman
SUSE Labs