Re: [RFC 1/2] x86/tsc: add a timer to make sure tsc_adjust is always checked

From: Thomas Gleixner
Date: Sat Apr 10 2021 - 05:27:15 EST


On Tue, Mar 30 2021 at 16:25, Feng Tang wrote:
> Normally the tsc_sync will be checked every time system enters idle state,
> but there is still caveat that a system won't enter idle, either because
> it's too busy or configured purposely to not enter idle. Setup a periodic
> timer to make sure the check is always on.

Bah. I really hate the fact that we don't have a knob to disable writes
to the TSC/TSC_ADJUST msrs. That would spare this business alltogether.

> +/*
> + * Normally the tsc_sync will be checked every time system enters idle state,
> + * but there is still caveat that a system won't enter idle, either because
> + * it's too busy or configured purposely to not enter idle.
> + *
> + * So setup a periodic timer to make sure the check is always on.
> + */
> +
> +#define SYNC_CHECK_INTERVAL (HZ * 600)
> +static void tsc_sync_check_timer_fn(struct timer_list *unused)

I've surely mentioned this before that glueing a define without an empty
newline to a function definition is horrible to read.

> +{
> + int next_cpu;
> +
> + tsc_verify_tsc_adjust(false);
> +
> + /* Loop to do the check for all onlined CPUs */

I don't see a loop here.

> + next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);

Why raw_smp_processor_id()? What's wrong with smp_processor_id()?

> + if (next_cpu >= nr_cpu_ids)
> + next_cpu = cpumask_first(cpu_online_mask);
> +
> + tsc_sync_check_timer.expires += SYNC_CHECK_INTERVAL;
> + add_timer_on(&tsc_sync_check_timer, next_cpu);
> +}
> +
> +static int __init start_sync_check_timer(void)
> +{
> + if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
> + return 0;
> +
> + timer_setup(&tsc_sync_check_timer, tsc_sync_check_timer_fn, 0);
> + tsc_sync_check_timer.expires = jiffies + SYNC_CHECK_INTERVAL;
> + add_timer(&tsc_sync_check_timer);
> +
> + return 0;
> +}
> +late_initcall(start_sync_check_timer);

So right now, if someone adds 'tsc=reliable' on the kernel command line
then all of the watchdog checking, except for the idle enter TSC_ADJUST
check is disabled. The NOHZ full people are probably going to be pretty
unhappy about yet another unconditional timer they have to chase down.

So this needs some more thought.

Thanks,

tglx