RE: [PATCH] hv_utils: return error if host timesysnc update is stale

From: Michael Kelley
Date: Thu Aug 20 2020 - 14:21:17 EST


From: Vineeth Pillai <viremana@xxxxxxxxxxxxxxxxxxx> Sent: Wednesday, August 19, 2020 10:45 AM
>
> If for any reason, host timesync messages were not processed by
> the guest, hv_ptp_gettime() returns a stale value and the
> caller (clock_gettime, PTP ioctl etc) has no means to know this
> now. Return an error so that the caller knows about this.
>
> Signed-off-by: Vineeth Pillai <viremana@xxxxxxxxxxxxxxxxxxx>
> ---
> drivers/hv/hv_util.c | 46 +++++++++++++++++++++++++++++++++-----------
> 1 file changed, 35 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c
> index 92ee0fe4c919..1357861fd8ae 100644
> --- a/drivers/hv/hv_util.c
> +++ b/drivers/hv/hv_util.c
> @@ -282,26 +282,52 @@ static struct {
> spinlock_t lock;
> } host_ts;
>
> -static struct timespec64 hv_get_adj_host_time(void)
> +static inline u64 reftime_to_ns(u64 reftime)
> {
> - struct timespec64 ts;
> - u64 newtime, reftime;
> + return (reftime - WLTIMEDELTA) * 100;
> +}
> +
> +/*
> + * Hard coded threshold for host timesync delay: 600 seconds
> + */
> +const u64 HOST_TIMESYNC_DELAY_THRESH = 600 * NSEC_PER_SEC;

Kernel test robot has already complained that this should be static,
and about the potential overflow based on the types of the constants in
the right side expression. I didn't check the details, but I suspect the
complaint is when building in 32-bit mode. This code does get built in
32-bit mode and it's possible for run 32-bit Linux guests on Hyper-V.

> +
> +static int hv_get_adj_host_time(struct timespec64 *ts)
> +{
> + u64 newtime, reftime, timediff_adj;
> unsigned long flags;
> + int ret = 0;
>
> spin_lock_irqsave(&host_ts.lock, flags);
> reftime = hv_read_reference_counter();
> - newtime = host_ts.host_time + (reftime - host_ts.ref_time);
> - ts = ns_to_timespec64((newtime - WLTIMEDELTA) * 100);
> +
> + /*
> + * We need to let the caller know that last update from host
> + * is older than the max allowable threshold. clock_gettime()
> + * and PTP ioctl do not have a documented error that we could
> + * return for this specific case. Use ESTALE to report this.
> + */
> + timediff_adj = reftime - host_ts.ref_time;
> + if (timediff_adj * 100 > HOST_TIMESYNC_DELAY_THRESH) {
> + pr_warn("TIMESYNC IC: Stale time stamp, %llu nsecs old\n",
> + HOST_TIMESYNC_DELAY_THRESH);

Let's provide the timediff_adj in the message instead of the constant
threshold value so we know the degree of staleness. :-)

Also, I'm wondering if this should be pr_warn_once(). Presumably
chronyd or whoever is reading /dev/ptp0 will give up after getting
this error, but if not, it would be nice to avoid filling up the console
with these error messages.

> + ret = -ESTALE;
> + }
> +
> + newtime = host_ts.host_time + timediff_adj;
> + *ts = ns_to_timespec64(reftime_to_ns(newtime));
> spin_unlock_irqrestore(&host_ts.lock, flags);
>
> - return ts;
> + return ret;
> }
>
> static void hv_set_host_time(struct work_struct *work)
> {
> - struct timespec64 ts = hv_get_adj_host_time();
>
> - do_settimeofday64(&ts);
> + struct timespec64 ts;
> +
> + if (!hv_get_adj_host_time(&ts))
> + do_settimeofday64(&ts);
> }
>
> /*
> @@ -622,9 +648,7 @@ static int hv_ptp_adjtime(struct ptp_clock_info *ptp, s64 delta)
>
> static int hv_ptp_gettime(struct ptp_clock_info *info, struct timespec64 *ts)
> {
> - *ts = hv_get_adj_host_time();
> -
> - return 0;
> + return hv_get_adj_host_time(ts);
> }
>
> static struct ptp_clock_info ptp_hyperv_info = {
> --
> 2.17.1