Re: [tip:x86/timers] x86/tsc: Make calibration refinement more robust

From: Daniel Vacek
Date: Tue Jan 08 2019 - 08:14:49 EST


Hi Thomas, Ingo, Peter.

I'm wondering, was x86/timers branch of tip tree merged to linus' tree
for v5.0-rc1? Somehow I do not see this patch make it through...

Am I doing something wrong?

--nX

On Tue, Nov 6, 2018 at 9:58 PM tip-bot for Daniel Vacek
<tipbot@xxxxxxxxx> wrote:
>
> Commit-ID: a786ef152cdcfebc923a67f63c7815806eefcf81
> Gitweb: https://git.kernel.org/tip/a786ef152cdcfebc923a67f63c7815806eefcf81
> Author: Daniel Vacek <neelx@xxxxxxxxxx>
> AuthorDate: Mon, 5 Nov 2018 18:10:40 +0100
> Committer: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> CommitDate: Tue, 6 Nov 2018 21:53:15 +0100
>
> x86/tsc: Make calibration refinement more robust
>
> The threshold in tsc_read_refs() is constant which may favor slower CPUs
> but may not be optimal for simple reading of reference on faster ones.
>
> Hence make it proportional to tsc_khz when available to compensate for
> this. The threshold guards against any disturbance like IRQs, NMIs, SMIs
> or CPU stealing by host on guest systems so rename it accordingly and
> fix comments as well.
>
> Also on some systems there is noticeable DMI bus contention at some point
> during boot keeping the readout failing (observed with about one in ~300
> boots when testing). In that case retry also the second readout instead of
> simply bailing out unrefined. Usually the next second the readout returns
> fast just fine without any issues.
>
> Signed-off-by: Daniel Vacek <neelx@xxxxxxxxxx>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> Link: https://lkml.kernel.org/r/1541437840-29293-1-git-send-email-neelx@xxxxxxxxxx
>
> ---
> arch/x86/kernel/tsc.c | 30 ++++++++++++++++--------------
> 1 file changed, 16 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index e9f777bfed40..3fae23834069 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -297,15 +297,16 @@ static int __init tsc_setup(char *str)
>
> __setup("tsc=", tsc_setup);
>
> -#define MAX_RETRIES 5
> -#define SMI_TRESHOLD 50000
> +#define MAX_RETRIES 5
> +#define TSC_DEFAULT_THRESHOLD 0x20000
>
> /*
> - * Read TSC and the reference counters. Take care of SMI disturbance
> + * Read TSC and the reference counters. Take care of any disturbances
> */
> static u64 tsc_read_refs(u64 *p, int hpet)
> {
> u64 t1, t2;
> + u64 thresh = tsc_khz ? tsc_khz >> 5 : TSC_DEFAULT_THRESHOLD;
> int i;
>
> for (i = 0; i < MAX_RETRIES; i++) {
> @@ -315,7 +316,7 @@ static u64 tsc_read_refs(u64 *p, int hpet)
> else
> *p = acpi_pm_read_early();
> t2 = get_cycles();
> - if ((t2 - t1) < SMI_TRESHOLD)
> + if ((t2 - t1) < thresh)
> return t2;
> }
> return ULLONG_MAX;
> @@ -703,15 +704,15 @@ static unsigned long pit_hpet_ptimer_calibrate_cpu(void)
> * zero. In each wait loop iteration we read the TSC and check
> * the delta to the previous read. We keep track of the min
> * and max values of that delta. The delta is mostly defined
> - * by the IO time of the PIT access, so we can detect when a
> - * SMI/SMM disturbance happened between the two reads. If the
> + * by the IO time of the PIT access, so we can detect when
> + * any disturbance happened between the two reads. If the
> * maximum time is significantly larger than the minimum time,
> * then we discard the result and have another try.
> *
> * 2) Reference counter. If available we use the HPET or the
> * PMTIMER as a reference to check the sanity of that value.
> * We use separate TSC readouts and check inside of the
> - * reference read for a SMI/SMM disturbance. We dicard
> + * reference read for any possible disturbance. We dicard
> * disturbed values here as well. We do that around the PIT
> * calibration delay loop as we have to wait for a certain
> * amount of time anyway.
> @@ -744,7 +745,7 @@ static unsigned long pit_hpet_ptimer_calibrate_cpu(void)
> if (ref1 == ref2)
> continue;
>
> - /* Check, whether the sampling was disturbed by an SMI */
> + /* Check, whether the sampling was disturbed */
> if (tsc1 == ULLONG_MAX || tsc2 == ULLONG_MAX)
> continue;
>
> @@ -1268,7 +1269,7 @@ static DECLARE_DELAYED_WORK(tsc_irqwork, tsc_refine_calibration_work);
> */
> static void tsc_refine_calibration_work(struct work_struct *work)
> {
> - static u64 tsc_start = -1, ref_start;
> + static u64 tsc_start = ULLONG_MAX, ref_start;
> static int hpet;
> u64 tsc_stop, ref_stop, delta;
> unsigned long freq;
> @@ -1283,14 +1284,15 @@ static void tsc_refine_calibration_work(struct work_struct *work)
> * delayed the first time we expire. So set the workqueue
> * again once we know timers are working.
> */
> - if (tsc_start == -1) {
> + if (tsc_start == ULLONG_MAX) {
> +restart:
> /*
> * Only set hpet once, to avoid mixing hardware
> * if the hpet becomes enabled later.
> */
> hpet = is_hpet_enabled();
> - schedule_delayed_work(&tsc_irqwork, HZ);
> tsc_start = tsc_read_refs(&ref_start, hpet);
> + schedule_delayed_work(&tsc_irqwork, HZ);
> return;
> }
>
> @@ -1300,9 +1302,9 @@ static void tsc_refine_calibration_work(struct work_struct *work)
> if (ref_start == ref_stop)
> goto out;
>
> - /* Check, whether the sampling was disturbed by an SMI */
> - if (tsc_start == ULLONG_MAX || tsc_stop == ULLONG_MAX)
> - goto out;
> + /* Check, whether the sampling was disturbed */
> + if (tsc_stop == ULLONG_MAX)
> + goto restart;
>
> delta = tsc_stop - tsc_start;
> delta *= 1000000LL;