Re: [GIT PULL] isolation: 1Hz residual tick offloading v3

From: Frederic Weisbecker
Date: Tue Jan 16 2018 - 10:41:08 EST


On Fri, Jan 12, 2018 at 02:18:13PM -0500, Luiz Capitulino wrote:
> On Thu, 4 Jan 2018 05:25:32 +0100
> Frederic Weisbecker <frederic@xxxxxxxxxx> wrote:
>
> > Ingo,
> >
> > Please pull the sched/0hz branch that can be found at:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
> > sched/0hz
> >
> > HEAD: 9e932b2cc707209febd130978a5eb9f4a943a3f4
> >
> > --
> > Now that scheduler_tick() has become resilient towards the absence of
> > ticks, current->sched_class->task_tick() is the last piece that needs
> > at least 1Hz tick to keep scheduler stats alive.
> >
> > This patchset adds a flag to the isolcpus boot option to offload the
> > residual 1Hz tick. This way the nohz_full CPUs don't have anymore tick
> > (assuming nothing else requires it) as their residual 1Hz tick is
> > offloaded to the housekeepers.
> >
> > For quick testing, say on CPUs 1-7:
> >
> > "isolcpus=nohz_offload,domain,1-7"
>
> Sorry for being very late to this series, but I've a few comments to
> make (one right now and others in individual patches).
>
> Why are extending isolcpus= given that it's a deprecated interface?
> Some people have already moved away from isolcpus= now, but with this
> new feature they will be forced back to using it.

I tried to remove isolcpus or at least change the way it works so that its
effects are reversible (ie: affine the init task instead of isolating domains)
but that got nacked due to the behaviour's expectations for userspace.

That's when I realized that kernel parameters are like userspace ABIs,
they can't be removed easily whether we deprecate them or not.

Also I needed to be able to control the various isolation features, and
nohz_full is the wrong place to do that as nohz_full is really just an
isolation feature like the others, nohz_full= should really just imply
full dynticks and not watchdog, workqueue or tilegx NAPI isolation...

So isolcpus= is now the place where we control the isolation features
and nohz is one of them.

The complain about isolcpus is the immutable result. I'm thinking about
making it modifiable to cpuset but I only see two possible solutions:

- Make the root cpuset modifiable
- Create a directory called "isolcpus" visible on the first cpuset mount
and move all processes there.

> What about just adding the new functionality to nohz_full=? That is,
> no new options, just make the tick go away since this has always been
> what nohz_full= was intended to do?

We can, or have isolcpus=nohz to do it, as both do almost the same.

But I'm afraid about the overhead for people used to nohz_full= once
they upgrade their kernels and see those workqueues once per second.

We can still affine those workqueues (in fact the whole unbound workqueue
mask) outside the nohz_full range. Still current users may be surprised
about that new overhead on housekeeping CPUs...