Re: [RFC PATCH V2 1/9] timekeeping: Expose the conversion information of monotonic raw

From: John Stultz
Date: Mon Feb 13 2023 - 14:28:31 EST


On Mon, Feb 13, 2023 at 11:08 AM <kan.liang@xxxxxxxxxxxxxxx> wrote:
>
> From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
>
> The conversion information of monotonic raw is not affected by NTP/PTP
> correction. The perf tool can utilize the information to correctly
> calculate the monotonic raw via a TSC in each PEBS record in the
> post-processing stage.
>
> The current conversion information is hidden in the internal
> struct tk_read_base. Add a new external struct ktime_conv to store and
> share the conversion information with other subsystems.
>
> Add a new interface ktime_get_fast_mono_raw_conv() to expose the
> conversion information of monotonic raw. The function probably be
> invoked in a NMI. Use NMI safe tk_fast_raw to retrieve the conversion
> information.
>
> Signed-off-by: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
> ---
> include/linux/timekeeping.h | 18 ++++++++++++++++++
> kernel/time/timekeeping.c | 24 ++++++++++++++++++++++++
> 2 files changed, 42 insertions(+)
>
> diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h
> index fe1e467ba046..94ba02e7eb13 100644
> --- a/include/linux/timekeeping.h
> +++ b/include/linux/timekeeping.h
> @@ -253,6 +253,21 @@ struct system_time_snapshot {
> u8 cs_was_changed_seq;
> };
>
> +/**
> + * struct ktime_conv - Timestamp conversion information
> + * @mult: Multiplier for scaled math conversion
> + * @shift: Shift value for scaled math conversion
> + * @xtime_nsec: Shifted (fractional) nano seconds offset for readout
> + * @base: (nanoseconds) base time for readout
> + */
> +struct ktime_conv {
> + u64 cycle_last;
> + u32 mult;
> + u32 shift;
> + u64 xtime_nsec;
> + u64 base;
> +};
> +
> /**
> * struct system_device_crosststamp - system/device cross-timestamp
> * (synchronized capture)
> @@ -297,6 +312,9 @@ extern void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot);
> /* NMI safe mono/boot/realtime timestamps */
> extern void ktime_get_fast_timestamps(struct ktime_timestamps *snap);
>
> +/* NMI safe mono raw conv information */
> +extern void ktime_get_fast_mono_raw_conv(struct ktime_conv *conv);
> +
> /*
> * Persistent clock related interfaces
> */
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 5579ead449f2..a202b7a0a249 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -505,6 +505,30 @@ u64 notrace ktime_get_raw_fast_ns(void)
> }
> EXPORT_SYMBOL_GPL(ktime_get_raw_fast_ns);
>
> +/**
> + * ktime_get_fast_mono_raw_conv - NMI safe access to get the conversion
> + * information of clock monotonic raw
> + *
> + * The conversion information is not affected by NTP/PTP correction.
> + */
> +void ktime_get_fast_mono_raw_conv(struct ktime_conv *conv)
> +{
> + struct tk_fast *tkf = &tk_fast_raw;
> + struct tk_read_base *tkr;
> + unsigned int seq;
> +
> + do {
> + seq = raw_read_seqcount_latch(&tkf->seq);
> + tkr = tkf->base + (seq & 0x01);
> + conv->cycle_last = tkr->cycle_last;
> + conv->mult = tkr->mult;
> + conv->shift = tkr->shift;
> + conv->xtime_nsec = tkr->xtime_nsec;
> + conv->base = tkr->base;
> + } while (read_seqcount_latch_retry(&tkf->seq, seq));
> +}
> +EXPORT_SYMBOL_GPL(ktime_get_fast_mono_raw_conv);

Thanks for taking another pass at this! Using CLOCK_MONOTONIC_RAW
removes a lot of the issues around time inconsistencies.

Though, I'm not super excited about exporting a lot of timekeeping
state out to drivers to have drivers then duplicate timekeeping logic.

Would it make more sense to have the timekeeping core export an
interface like: ktime_get_mono_raw_from_timestamp(struct clocksource
*cs, cycle_t timestamp)?

The complexity is that the timestamp may be pretty far in the past, so
special handling will be needed to do the mult/shift conversion for a
large negative delta.

Also we need some way of checking that the current clocksource
(because it can change) matches the timestamp source?

Maybe some get_mono_raw_timestamp(&cs) accessor that captures both the
current clocksource and the timestamp?

I've not thought this out fully, but curious if something like that
might work for you and also encapsulate the timekeeping logic better
so we don't have to have that logic leak out to various driver
implementations.

thanks
-john