Re: [PATCH 1/1] hotplug cpu: migrate a task within its cpuset

From: Oleg Nesterov
Date: Sat Aug 25 2007 - 05:45:21 EST


On 08/24, Andrew Morton wrote:
>
> On Fri, 24 Aug 2007 17:18:06 -0500
> Cliff Wickman <cpw@xxxxxxx> wrote:
>
> > When a cpu is disabled, move_task_off_dead_cpu() is called for tasks
> > that have been running on that cpu.
> >
> > Currently, such a task is migrated:
> > 1) to any cpu on the same node as the disabled cpu, which is both online
> > and among that task's cpus_allowed
> > 2) to any cpu which is both online and among that task's cpus_allowed
> >
> > It is typical of a multithreaded application running on a large NUMA system
> > to have its tasks confined to a cpuset so as to cluster them near the
> > memory that they share. Furthermore, it is typical to explicitly place such
> > a task on a specific cpu in that cpuset. And in that case the task's
> > cpus_allowed includes only a single cpu.
>
> operator error..
>
> > This patch would insert a preference to migrate such a task to some cpu within
> > its cpuset (and set its cpus_allowed to its entire cpuset).
> >
> > With this patch, migrate the task to:
> > 1) to any cpu on the same node as the disabled cpu, which is both online
> > and among that task's cpus_allowed
> > 2) to any online cpu within the task's cpuset
> > 3) to any cpu which is both online and among that task's cpus_allowed
>
> Wouldn't it be saner to refuse the offlining request if the CPU has tasks
> which cannot be migrated to any other CPU? I mean, the operator has gone
> and asked the machine to perform two inconsistent/incompatible things at
> the same time.

I don't think so (regardless of this patch and CONFIG_CPUSETS). Any user
can bind its process to (say) CPU 4. This shouldn't block cpu-unplug.

Now, let's suppose that this process is a member of some cpuset which
contains CPUs 3 and 4, and CPU 4 goes down.

Before this patch, process leaves its ->cpuset and migrates to some "random"
any_online_cpu(). With this patch it stays within ->cpuset and migrates to
CPU 3.

> Look at it this way. If we were to merge this patch then it would be
> logical to also merge a patch which has the following description:
>
> "if an process attempts to pin itself onto an presently-offlined CPU,
> the kernel will choose a different CPU according to <heuristics> and
> will pin the process to that CPU instead".

set_cpus_allowed() just returns -EINVAL in that case, this looks a bit
more logical.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/