Re: [PATCH 0/4] Allow CPU0 to be nohz full

From: Nicholas Piggin
Date: Thu Apr 11 2019 - 23:16:13 EST


Paul E. McKenney's on April 12, 2019 1:42 am:
> On Tue, Apr 09, 2019 at 07:21:54PM +1000, Nicholas Piggin wrote:
>> Thomas Gleixner's on April 6, 2019 3:54 am:
>> > On Fri, 5 Apr 2019, Nicholas Piggin wrote:
>> >> Thomas Gleixner's on April 5, 2019 12:36 am:
>> >> > On Thu, 4 Apr 2019, Nicholas Piggin wrote:
>> >> >
>> >> >> I've been looking at ways to fix suspend breakage with CPU0 as a
>> >> >> nohz CPU. I started looking at various things like allowing CPU0
>> >> >> to take over do_timer again temporarily or allowing nohz full
>> >> >> to be stopped at runtime (that is quite a significant change for
>> >> >> little real benefit). The problem then was having the housekeeping
>> >> >> CPU go offline.
>> >> >>
>> >> >> So I decided to try just allowing the freeze to occur on non-zero
>> >> >> CPU. This seems to be a lot simpler to get working, but I guess
>> >> >> some archs won't be able to deal with this? Would it be okay to
>> >> >> make it opt-in per arch?
>> >> >
>> >> > It needs to be opt in. x86 will fall on its nose with that.
>> >>
>> >> Okay I can add that.
>> >>
>> >> > Now the real interesting question is WHY do we need that at all?
>> >>
>> >> Why full nohz for CPU0? Basically this is how their job system was
>> >> written and used, testing nohz full was a change that came much later
>> >> as an optimisation.
>> >>
>> >> I don't think there is a fundamental reason an equivalent system
>> >> could not be made that uses a different CPU for housekeeping, but I
>> >> was assured the change would be quite difficult for them.
>> >>
>> >> If we can support it, it seems nice if you can take a particular
>> >> configuration and just apply nohz_full to your application processors
>> >> without any other changes.
>> >
>> > This wants an explanation in the patches.
>>
>> Okay.
>>
>> > And patch 4 has in the changelog:
>> >
>> > nohz_full has been successful at significantly reducing jitter for a
>> > large supercomputer customer, but their job control system requires CPU0
>> > to be for housekeeping.
>> >
>> > which just makes me dazed and confused :)
>> >
>> > Other than some coherent explanation and making it opt in, I don't think
>> > there is a fundamental issue with that.
>>
>> I will try to make the changelogs less jibberish then :)
>
> Maybe this is all taken care of now, but do the various clocks stay
> synchronized with wall-clock time if all CPUs are in nohz_full mode?
> At one time, at least one CPU needed to keep its scheduler-clock
> interrupt going in order to keep things in sync.

Ah, may not have been clear in the changelog -- the series still
requires at least one CPU present at boot time to be a housekeeper
that keeps things running. So conceptually this doesn't change
anything about runtime behaviour, the main change is the boot-time
handoff from CPU0.

> The ppc timebase register might make it possible to do this without any
> scheduler-clock interrupts, but figured I should check. ;-)

I dont know all this code too well, but if we really wanted to push
things, I think nohz-full could be more aggressive in shutting down
the tick and possibly even avoiding a housekeeping CPU completely, but
you would have to do that work on user->kernel switch too. Likely the
complexity and overhead is not worthwhile.

Other thing is you might be able to avoid the jiffies tick completely
and change jiffies to read from timebase register. Lot of interesting
things we could try.

Thanks,
Nick