Re: TSC to Mono-raw Drift

From: Thomas Gleixner
Date: Thu Nov 01 2018 - 13:41:25 EST


Miroslav,

On Wed, 24 Oct 2018, Miroslav Lichvar wrote:
> On Tue, Oct 23, 2018 at 11:31:00AM -0700, John Stultz wrote:
> > On Fri, Oct 19, 2018 at 3:36 PM, John Stultz <john.stultz@xxxxxxxxxx> wrote:
> > > On Fri, Oct 19, 2018 at 1:50 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> > >> On Fri, 19 Oct 2018, John Stultz wrote:
> > >>> We might be able to reduce the degree in this case, but I worry the
> > >>> extra complexity may only cause problems for others.
> > >>
> > >> Is it really that complex to add a fixed correction value periodically?
>
> The error is too large to be corrected by stepping on clock updates.
> For a typical TSC frequency we have multiplier in the range of few
> millions, so that's a frequency error of up to few hundred ppb. In the
> old days when the clock was updated 1000 times per second that would
> be hidden in the resolution of the clock, but now with tickless
> kernels those steps would be very noticeable.
>
> If the multiplier was adjusted in the same way as the non-raw clock,
> there wouldn't be any steps in time, but there would be steps in
> frequency and the time error would be proportional to the update
> interval. For a clock updated once per second that's an error of up to
> a few hundreds of nanoseconds.

That only happens when the system was completely idle for a second and in
that case it's a non issue because the clock is updated before it's
used. So nothing will be able to observe the time jumping forward by a few
or even a few hundreds of nanoseconds. For the regular case, where CPUs are
busy and the update happens 100/250/1000 times per second the jump forward
will not be noticable at all.

> I agree with John. I think the raw monotonic clock should be stable.

It is stable. It's still monotonically increasing.

> It would help if we better understood the use case. If I needed a
> clock that ticks in an exact proportion to the tsc, why wouldn't I use
> the tsc directly? Is this about having a fall back in case the tsc
> cannot be used (e.g. due to unsynchronized CPUs)?
>
> If the frequency error was exported, it could be compensated where
> necessary. Maybe that would work for the original poster?

I'd rather not go there and have yet another magic knob to deal with and we
need to have the same thing in the kernel as well.

> A better fix might be to modify the calculation of time to use a
> second multiplier, effectively increasing its resolution. However,
> that would slow down all users of the clock.

Right, and we carefully try to avoid that.

I really would like to give the adjustment a try. It should solve the
problem Christopher cares about and not have any side effects on other
users of monotonic raw including NTP/PTP etc. Famous last words ....

Thanks,

tglx