Re: [PATCH 0/3] timekeeping: Improved NOHZ frequency steering (v2)

From: John Stultz
Date: Fri May 12 2017 - 13:28:52 EST


On Fri, May 12, 2017 at 8:14 AM, Miroslav Lichvar <mlichvar@xxxxxxxxxx> wrote:
> On Tue, Jul 15, 2014 at 09:02:38PM -0700, John Stultz wrote:
>> On 07/08/2014 04:08 AM, Miroslav Lichvar wrote:
>> > I spent some time trying to figure out a workaround for the nanosecond
>> > rounding, but I didn't find anything that wouldn't complicate the mult
>> > adjustment logic and bring back the problems which the direct division
>> > approach is supposed to solve.
>> >
>> > It seems it may be a while before the old vsyscalls are fixed. How
>> > about including only the first two patches from this set for now?
>
>> So thanks for the ping here. If you're happy with the first two as an
>> initial step, I'd be willing to try to push those in. The only trouble
>> is there's a whole lot of timekeeping churn headed for 3.17 that Thomas
>> has cooked up. While there isn't likely to be direct conflicts in the
>> changes, I get nervous about mixing too many changes in subtle code at once.
>
> I'm sorry for digging up this skeleton. Are we any closer to being
> able to remove CONFIG_GENERIC_TIME_VSYSCALL_OLD, which prevented
> simplifying the steering logic of the internal clock?

Yea. I think we've waited for a few years w/o action on this from the
ppc and ia64 folks.

Probably time to put a compile warning in making it clear its going to
be removed in the next release or two to force the issue.


> With the new PTP KVM clock the problem can be easily observed. Here is
> a graph of the offset and frequency as measured by chronyd when
> configured to synchronize the guest's clock to the host using the
> virtual PHC. In the middle is the point when the NTP error reached
> zero. The apparent frequency jumped by about 50 ppb and the offset
> improved by an order of magnitude.
>
> https://mlichvar.fedorapeople.org/tmp/kvm_phc.png
>
> I see this with real PHCs and PTP/NTP synchronization too. It's very
> confusing when the timekeeping changes so much for no apparent reason.
> If we can't remove the old vsyscalls yet, I was thinking maybe a new
> flag could be added to adjtimex to report the error, so applications
> can at least detect this problem and consider stepping the clock in
> order to reset the error?
>
> Thoughts?

I'd rather not have short-term hacks that applications have to adapt.
So I think we should drop the old vsyscall method in the near term.
Sorry this sort of fell off my radar.

Do you have an updated set of patches you want to get ready to address
the issue? We can get those reviewed while we increase the pressure on
dropping the OLD_VSYSCALL implementations.

thanks
-john