Re: [PATCH v6 5/6] cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst

From: Tejun Heo
Date: Tue Aug 24 2021 - 15:04:38 EST


Hello,

On Tue, Aug 24, 2021 at 01:35:33AM -0400, Waiman Long wrote:
> Sorry for the late reply as I was on vacation last week.

No worries. Hope you enjoyed the vacation. :)

> > All the above ultimately says is that "a new task cannot be moved to a
> > partition root with no effective cpu", but I don't understand why this would
> > be a separate rule. Shouldn't the partition just stop being a partition when
> > it doesn't have any exclusive cpu? What's the benefit of having multiple its
> > own failure mode?
>
> A partition with 0 cpu can be considered as a special partition type for
> spawning child partitions. This can be temporary as the cpus will be given
> back when a child partition is destroyed.

But it can also happen by cpus going offline while the partition is
populated, right? Am I correct in thinking that a partition without cpu is
valid if its subtree contains cpus and invalid otherwise? If that's the
case, it looks like the rules can be made significantly simpler. The parent
cgroups never have processes anyway, so a partition is valid if its subtree
contains cpus, invalid otherwise.

> > So, I think this definitely is a step in the right direction but still seems
> > to be neither here or there. Before, we pretended that we could police the
> > input when we couldn't. Now, we're changing the interface so that it
> > includes configuration failures as an integral part; however, we're still
> > policing some particular inputs while letting other inputs pass through and
> > trigger failures and why one is handled one way while the other differently
> > seems rather arbitrary.
> >
> The cpu_exclusive and load_balance flags are attributes associated directly
> with the partition type. They are not affected by cpu availability or
> changing of cpu list. That is why they are kept even when the partition
> become invalid. If we have to remove them, it will be equivalent to changing
> partition back to member and we may not need an invalid partition type at
> all. Also, we will not be able to revert back to partition again when the
> cpus becomes available.

Oh, yeah, I'm not saying to lose those states. What I'm trying to say is
that the rules and failure modes seem a lot more complicated than they need
to be. If the configuration becomes invalid for whatever reason, transition
the partition into invalid state and report why. If the situation resolves
for whatever reason, transition it back to valid state. Shouldn't that work?

Thanks.

--
tejun