Re: TSC to Mono-raw Drift

From: John Stultz
Date: Fri Oct 19 2018 - 18:37:01 EST


On Fri, Oct 19, 2018 at 1:50 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> John,
>
> On Fri, 19 Oct 2018, John Stultz wrote:
>> On Fri, Oct 19, 2018 at 11:57 AM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>> > I don't think you need complex oscillation for that. The error is constant
>> > and small enough that it is a fractional nanoseconds thing with an interval
>> > <= 1s. So you can just add that in a regular interval. Due to it being
>> > small you can't observe time jumping I think.
>>
>> Well, from the examples the trouble is we seem to be a bit fast,
>> rather then slow.
>> So we'll have to reduce mult by one, and rework the calculations, but
>> maybe something like this (correcting the raw_interval value) would
>> work.
>
> Shouldn't be rocket science. It's a one off calculation of adjustment value
> and maybe the period at which the correction happens.
>
>> But this also sort of breaks, fundamental argument that the raw clock
>> is a simple mult/shift transformation of the underlying clocksource
>> counter. Its not the accuracy of the clock but the consistency that
>> was key.
>>
>> The counter argument is that the raw clock is abstracting the
>> underlying hardware so folks who would have used the TSC directly can
>> now use the raw clock and have a generic abstracted hardware-counter
>> interface. So userland shouldn't really be worried about the
>> occasional injections made since they shouldn't be trying to
>> re-generate the abstraction from the hardware themselves. <--
>> Remember this point as we move to the next comment:)
>>
>> > The end-result is 'correct' as much correct it is in relation to real
>> > nanoseconds. :)
>> >
>> >> I guess I'd want to understand more of the use here and the need to
>> >> tie the raw clock back to the hardware counter it abstracts.
>> >
>> > The problem there is ART which is distributed to PCIe devices and ART time
>> > stamps are exposed in various ways. ART has a fixed ratio vs. TSC so there
>> > is a reasonable expectation that MONOTONIC_RAW is accurate.
>>
>> Which is maybe sort of my issue here. The raw clock provided a
>> abstraction away from the hardware for generic usage, but then its
>> being re-used with other un-abstracted hardware references. So unless
>> they use the same method of transformation, there will be problems (of
>> varying degree).
>
> OTOH. If people use the CPUID provided frequency information and the TSC
> from userspace then they get different results which is contrary to the
> goal of providing them an abstracted way of doing it.

But that's my point. If they are pulling time values from the hardware
directly that's unabstracted. I'm not sure its smart to be comparing
the abstracted and unabstracted time stamps if your worried about
precision. They are sort of two separate (though similar) time
domains.

>> We might be able to reduce the degree in this case, but I worry the
>> extra complexity may only cause problems for others.
>
> Is it really that complex to add a fixed correction value periodically?
>
> I don't think so and it should just work for any clocksource which is
> exposed this way. Famous last words .....

I'm not saying that the code is overly complex (at least compared to
the rest of the timekeeping code :), but just how the accumulation is
done is less-trivial. So if someone else is trying to mimic the
abstracted time with unabstracted hardware values (again, not
something I reccomend, but that's sort of the usage case pushing
this), they need to use a similar method that is slightly more
complicated (or use slower math). Its all subtle stuff, but this makes
something that was relatively very simple (by design) a bit harder to
explain.

So I'm not out right objecting, I'm more wringing my hands at the
potential edge cases that improving the accuracy of the un-adjusted
clock raises. :)

thanks
-john