Re: [PATCH v2 0/7] CPU hotplug, cpusets: Fix issues with cpusetshandling upon CPU hotplug

From: Nishanth Aravamudan
Date: Fri May 04 2012 - 16:46:47 EST


On 04.05.2012 [22:14:16 +0200], Peter Zijlstra wrote:
> On Sat, 2012-05-05 at 01:28 +0530, Srivatsa S. Bhat wrote:
> > On 05/05/2012 12:54 AM, Peter Zijlstra wrote:
> >
> > >
> > >> Documentation/cgroups/cpusets.txt | 43 +++--
> > >> include/linux/cpuset.h | 4
> > >> kernel/cpuset.c | 317 ++++++++++++++++++++++++++++---------
> > >> kernel/sched/core.c | 4
> > >> 4 files changed, 274 insertions(+), 94 deletions(-)
> > >
> > > Bah, I really hate this complexity you've created for a problem that
> > > really doesn't exist.
> > >
> >
> >
> > Doesn't exist? Well, I believe we do have a problem and a serious one
> > at that too!
>
> Still not convinced,..
>
> > The heart of the problem can be summarized in 2 sentences:
> >
> > o During a CPU hotplug, tasks can move between cpusets, and never
> > come back to their original cpuset.
>
> This is a feature! You cannot say a task is part of a cpuset and then
> run it elsewhere just because things don't work out.
>
> That's actively violating the meaning of cpusets.

Tbh, I agree with you Peter, as I think that's how cpusets *should*
work. But I'll also reference `man cpuset`:

Not all allocations of system memory are constrained by cpusets,
for the following reasons.

If hot-plug functionality is used to remove all the CPUs that
are currently assigned to a cpuset, then the kernel will
automatically update the cpus_allowed of all processes attached
to CPUs in that cpuset to allow all CPUs. When memory hot-plug
function- ality for removing memory nodes is available, a
similar exception is expected to apply there as well. In
general, the kernel prefers to violate cpuset placement, rather
than starving a process that has had all its allowed CPUs or
memory nodes taken off- line. User code should reconfigure
cpusets to only refer to online CPUs and memory nodes when using
hot-plug to add or remove such resources.

So cpusets are, per their own documentation, not hard-limits in the face
of hotplug.

I, personally, think we should just kill of tasks in cpuset-constrained
environments that are nonsensical (no memory, no cpus, etc.). But, it
would seem we've already supported this (inherit the parent in the face
of hotplug) behavior in the past. Not sure we should break it ... at
least on the surface.

> > o Tasks might get pinned to lesser number of cpus, unreasonably.
>
> -ENOPARSE, are you trying to say that when the set contains 4 cpus and
> you unplug one its left with 3? Sounds like pretty damn obvious, that's
> what unplug does, it takes a cpu away.

I think he's saying that it's pinned to 3 forever, even if that 4th CPU
is re-plugged.

> > Both these are undesirable from a system-admin point of view.
>
> Both of those are fundamental principles you cannot change.

I see what you did there :)

<snip>

> > (Btw, Ingo had also suggested reworking this whole cpuset thing, while
> > reviewing the previous version of this fix.
> > http://thread.gmane.org/gmane.linux.kernel/1250097/focus=1252133)
>
> I still maintain that what you're proposing is wrong. You simply cannot
> run a task outside of the set for a little while and say that's ok.
>
> A set becoming empty while still having tasks is a hard error and not
> something that should be swept under the carpet. Currently we printk()
> and move them to the parent set until we find a set with !0 cpus. I
> think Paul Jackson was wrong there, he should have simply SIGKILL'ed the
> tasks or failed the hotplug.

Ah, excuse my quoting of the man-page, it would seem you are aware of
the pre-existing behavior.

So, I think I'm ok with putting the onus of all this on the
configuration owner -- don't configure/hotplug, etc. things stupidly.

We should change the cpusets implementation, then, though; update the
man-pages, etc.

So I can see several solutions:

- Rework cpusets to not be so nice to the user and kill of tasks that
run in stupid cpusets. (to be written)
- Keep current behavior to be nice to the user, but make it much noisier
when the cpuset rules are being broken because they are stupid (do
nothing choice)
- Track/restore the user's setup when it's possible to do so. (this
patchset)

I'm not sure any of these is "better" than the rest, but they probably
all have distinct merits.

How easy will it be for something like libvirt to handle that first
case? Can libvirt be modified to recognize that a VM has been killed due
to having an empty cpuset? And is that reasonable? What about other
users of cpusets (what are they?)?

Thanks,
Nish

--
Nishanth Aravamudan <nacc@xxxxxxxxxx>
IBM Linux Technology Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/