Re: [PATCH v3 5/5] cpusets, suspend: Save and restore cpusetsduring suspend/resume

From: Peter Zijlstra
Date: Tue May 15 2012 - 16:11:10 EST


On Tue, 2012-05-15 at 11:31 -0700, David Rientjes wrote:

> However, if a thread did set_mempolicy(MPOL_BIND, 2-3) where cpuset.mems
> == node_online_map, cpuset.mems changes to 0-1, then cpuset.mems changes
> back to node_online_map, then I believe (and implemented in the mempolicy
> code and added the specification in the man page) that the thread should
> be bound to nodes 2-3.

I disagree, but alas that is done :-(

But what happens if you unplug nodes 2-3?

> > > I fixed this problem by introducing MPOL_F_* flags in set_mempolicy(2)
> > > by saving the user intended nodemask passed by set_mempolicy() and
> > > respecting it whenever allowed by cpusets.
> >
> > So, if you read that thread, this is what (in essence) Srivatsa proposed
> > in v2. We store the user-defined cpumask and keep it regardless of
> > kernel decisions. We intersect the user-defined cpumask with the kernel
> > (which is really reflecting the administrator's hotplug decisions)
> > topology and run tasks in constrained cpusets on the result. We reflect
> > this decision in a new read-only file in each cpuset that indicates the
> > "actual" cpus that a task in a given cpuset may be scheduled on.
> >
>
> I don't think we need a new read-only file that exposes the stored
> cpumask, I think it should be stored and respected when possible and the
> set of allowed cpus be exported in the way it always has been, through
> cpuset.cpus.

I agree we don't want the new file, I'm not sure what you mean with the
rest though.

> If a cpuset is defined to have cpuset.cpus == 2-3, cpu 3 is offlined, and
> then cpu 3 is onlined, the behavior is currently undefined.

Uhm, its documented to not restore 3. And changing this at this point
seems pointless, it doesn't solve Srivatsa's problem and is otherwise
pointless churn.

> You could
> make the argument that cpusets is purely about NUMA and that cpu 3 may no
> longer have affinity to cpuset.mems in which case I would agree that we
> should not reset cpuset.cpus to 2-3 in this case. But that doesn't seem
> to be the motivation here because we keep talking about suspend.

The problem is that if you have some cpusets configuration and then do a
s/r cycle the entire configuration is wrecked because suspend
hot-unplugs all but cpu0 and resume re-plugs the cpus.

This destroys all masks and migrates all tasks in sets not including
cpu0 to the root set.

Srivatsa proposed to 'fix' this by remembering state of regular hotplug,
to which I strongly oppose, hotplug is destructive and should be,
there's no point in remembering state that might never be used again.
Worse you temporarily 'break' your cpuset 'promise' to then silently
restore it.

The s/r resume case is special in that userspace isn't actually around
to observe the cpus going away and coming back, also it has the
guarantee the cpus will be coming back.

So s/r is special and should not destroy state, hotplug should.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/