Re: [PATCH] Skew tick for systems with a large number of processors
From: Thomas Gleixner
Date: Thu Jul 03 2025 - 16:59:52 EST
On Thu, Jul 03 2025 at 07:51, Christoph Lameter wrote:
> On Thu, 3 Jul 2025, Thomas Gleixner wrote:
>> It's not rocket science to validate whether these power saving concerns
>> still apply and to reach out to people who have been involved in this
>> and ask them to revalidate. I just Cc'ed Arjan for you.
>
> They definitely apply on an Android phone with fewer cores. There you
> would want to reduce the number of wakeups as much as possible to
> conserver power so it needs synchronized mode.
That's kinda obvious, but with the new timer migration model, which
stops to place timers by crystalball logic, this might not longer be
true and needs actual data to back up that claim.
> That is why my initial thought was to make it dependent on the number of
> active processors.
>
>> There is only a limited range of scenarios, which need to be looked at:
>>
>> - Big servers and the power saving issues on lightly loaded
>> machines
>
> If it is only a few active cores and the system is basically idle then
> it is better to have a synchronized tick but if the system has lots of
> active processors then the tick should be skewed.
I agree with the latter, but is your 'few active cores' claim backed by
actual data taken from a current kernel or based on historical evidence
and hearsay?
> So maybe one idea would be to have a counter of active ticks and skew
> them if that number gets too high.
The idea itself is not that horrible. Though we should tap into the
existing accounting resources to figure that out instead of adding yet
another ill defined global counter to the mess. All the required metrics
should be there already.
Actually it should be solvable if you look at it just from a per CPU
perspective. This assumes that NOHZ_IDLE is active, because if it is not
then you can just go and skew unconditionally.
If a CPU is busy, then it just arms the tick skewed. If it goes idle,
then it looks at the expected idle time, which is what NOHZ does already
today. If it decides to stop the tick until the next timer list expires,
then it aligns it. Earlier expiring high resolution timers obviously
override the initial decision, but that's not much different from what
is happening today already.
>> - Battery operated devices
>
> These usually have 1-4 cores. So synchronized is obviously the best.
Same question as above.
>> If we could have predicted the future and the consequences of ad hoc
>> decisions, we wouldn't have had a BKL, which took only 20 years of
>> effort to get rid of (except for the well hidden leftovers in tty).
>
> Oh the BKL was good. Synchronization was much faster after all and less
> complex. I am sure a BKL approach on small systems would still improve
> performance.
Feel free to scale back to 4 cores and enjoy the undefined BKL
semantics forever in your own fork of 2.2.final :)
Thanks,
tglx