Re: [patch 3/4] mempolicy: add MPOL_F_STATIC_NODES flag

From: David Rientjes
Date: Wed Feb 13 2008 - 04:37:13 EST


On Wed, 13 Feb 2008, Paul Jackson wrote:

> The infamous unpublished (except to a few) patch I drafted on Christmas
> (Dec 25, 2007) basically added two new modes for how mempolicy
> nodemasks were to be resolved:
> 1) a static, no remap, mode, such as in this present patchset, and
> 2) a cpuset relative mode that correctly handled the case of cpusets
> growing larger (increased numbers of nodes.)
>
> I'd like to persuade you to add case (2) as well. But I suspect,
> given that case (2) was never as compelling to you as it was to me
> in our December discussions, that I'll have little luck doing that.
>

MPOL_F_STATIC_NODES already handles the second case because you can
specify nodes that aren't currently accessible because of your cpuset in
the hopes that eventually they will become accessible. It's possible to
pass a nodemask with all bits set so that when the cpuset's set of nodes
expand, the mempolicy is effected over the intersection of the two on
rebind.

Other than that, perhaps if you can elaborate more on what you imagined
supporting as far as cpusets growing larger (or supporting node hotplug,
which is the same type of problem), we can discuss that further before I
resend my patches.

> Notes and questions on your current patchset:
>
> 1) It looks like you -add- a second nodemask to struct mempolicy:
> nodemask_t cpuset_mems_allowed; /* mempolicy relative to these nodes */
> nodemask_t user_nodemask; /* nodemask passed by user */
>
> I believe you can overlay these two nodemasks using a union, and avoid making
> struct mempolicy bigger.
>

Ahh, since policy->cpuset_mems_allowed is only meaningful in the
non-MPOL_F_STATIC_NODES case, that probably will work. For the purposes
of this patchset, we can certainly do that. I'm wondering whether future
expansions will require them to be separated again, however.

> 2) Those 'mpol_mode()' wrappers on all those mempolicy->policy
> evaluations look dangerous to me. It looks like a code bug
> waiting to happen. Unless I miss my guess, if you forget one of
> those wrappers (or someone making a future change misses one), no
> one will notice until sometime at runtime, someone makes use of the
> new fangled MPOL_F_STATIC_NODES mode in some new situation, and we
> end up executing code that we didn't expect to execute just then.
>
> I urge you to reconsider, and keep it so that the 'policy' field of struct
> mempolicy naturally evaluates to just the policy. There should be just one
> pair of places, on entry to, and exit from, the kernel, where the code is
> aware of the need to pack the mode and the policy into a single word.
>

Ok.

> 3) Does your patchset allow the user to specify a nodemask that includes nodes
> outside of the current cpuset with a MPOL_F_STATIC_NODES mode? At first
> glance, I didn't see that it did, but I should have looked closer. This
> strikes me as an essential aspect of this mode.
>

It does, and that's why I was a little curious about your second case at
the beginning of this email.

The user's nodemask is always stored unaltered in policy->user_nodemask.
In mpol_new(), we intersect the current accessible nodes with that
nodemask to determine if there's even a mempolicy to effect. If not,
mpol_new() returns ERR_PTR(-EINVAL) and we fall back to the existing
policy (if any) for the VMA or task. Otherwise, we use the intersection
of those two nodemasks.

In mpol_rebind_policy() with MPOL_F_STATIC_NODES, we always intersect
policy->user_nodemask with the set of accessible nodes (nodemask_t
*newmask). So if we can now access nodes that we previously couldn't,
they are now part of the interleave nodemask. A similiar situation occurs
when building the zonelist for the MPOL_BIND case and you can see the
change I made to MPOL_PREFERRED in the incremental patch I sent you.

The only way that newly-accessible nodes will not become a part of the
mempolicy nodemask is when the user's nodemask and the set of accessible
nodes is disjoint when the policy was created.

It is arguable whether we want to support that behavior as well (and we
definitely could do it, it's not out of the scope of mempolicies). Lee
had specific requirements of rejecting nodemasks that had no nodes in the
intersection based on the current implementation, but we could continue
discussing the possibility of setting up mempolicies that are uneffected
when they are created and only become policy when they are rebound later.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/