Re: [PATCH] Do not use cpu_to_node() to find an offlined cpu'snode.

From: David Rientjes
Date: Wed Oct 10 2012 - 16:30:23 EST


On Wed, 10 Oct 2012, Peter Zijlstra wrote:

> > If cpu_to_node() always returns a valid node id even if all cpus on the
> > node are offline, then the cpumask_of_node() implementation, which the
> > sched code is using, should either return an empty cpumask (if
> > node_to_cpumask_map[nid] isn't freed) or cpu_online_mask. The change in
> > behavior here occurred because
> > cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch in -mm doesn't
> > return a valid node id and forces it to return -1 so a kzalloc_node(...,
> > -1) fallsback to allocate anywhere.
>
> I think that's broken semantics.. so far the entire cpu<->node mapping
> was invariant during hotplug. Changing that is going to be _very_
> interesting and cannot be done lightly.
>
> Because as I said, per-cpu memory is preserved over hotplug, and that
> has numa affinity.
>
> So for now, let me NACK that patch. You cannot go change stuff like
> that.
>

Agreed, that makes the nack-count up to 2 now. Andrew, please remove
cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved.patch
cpu_hotplug-unmap-cpu2node-when-the-cpu-is-hotremoved-fix.patch
from -mm.

> > But if you only need cpu_to_node() when waking up to find a runnable cpu
> > for this NUMA information, then I think you can just change the
> > kzalloc_node() in alloc_{fair,rt}_sched_group() to do
> > kzalloc(..., cpu_online(cpu) ? cpu_to_node(cpu) : NUMA_NO_NODE).
>
> That's a confusing statement, the wakeup stuff and the
> alloc_{fair,rt}_sched_group() stuff are unrelated, although both sites
> might need fixing if we're going to go ahead with this.
>

The alternative is for node hot-remove to do an iteration of all possible
cpus and set cpu-to-node to be NUMA_NO_NODE for all offlined cpus that map
to that node. If cpu_online() is true for any of those cpus, then
obviously it can't be offlined. We want to do this so that
kzalloc_node(..., cpu_to_node()) fallsback to allocating from any node,
which it should, and because a subsequent node hot-add event that reuses
the same node id may not be the same node.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/