Re: [RESEND PATCH] clocksource/arm_arch_timer: Fix masking for high freq counters

From: Marc Zyngier
Date: Sat Aug 07 2021 - 06:52:22 EST


Hi Oliver,

On Fri, 06 Aug 2021 19:21:26 +0100,
Oliver Upton <oupton@xxxxxxxxxx> wrote:
>
> Unfortunately, the architecture provides no means to determine the bit
> width of the system counter. However, we do know the following from the
> specification:
>
> - the system counter is at least 56 bits wide
> - Roll-over time of not less than 40 years
>
> To date, the arch timer driver has depended on the first property,
> assuming any system counter to be 56 bits wide and masking off the rest.
> However, combining a narrow clocksource mask with a high frequency
> counter could result in prematurely wrapping the system counter by a
> significant margin. For example, a 56 bit wide, 1GHz system counter
> would wrap in a mere 2.28 years!
>
> This is a problem for two reasons: v8.6+ implementations are required to
> provide a 64 bit, 1GHz system counter. Furthermore, before v8.6,
> implementers may select a counter frequency of their choosing.
>
> Fix the issue by deriving a valid clock mask based on the second
> property from above. Set the floor at 56 bits, since we know no system
> counter is narrower than that.
>
> Suggested-by: Marc Zyngier <maz@xxxxxxxxxx>
> Signed-off-by: Oliver Upton <oupton@xxxxxxxxxx>
> ---
> This patch was tested with QEMU, tweaked to provide a 1GHz system
> counter frequency, as I could not easily figure out how to tweak the
> base FVP to provide a 1GHz counter.

<FVP>
"bp.refcounter.base_frequency" is the property you are looking for. In
general, passing --list-params to the model reveals a treasure trove
of weird and wonderful options that can be used to configure the model
to your liking.
</FVP>

>
> Parent commit: 0c32706dac1b ("arm64: stacktrace: avoid tracing arch_stack_walk()")
>
> drivers/clocksource/arm_arch_timer.c | 28 ++++++++++++++++++++++++++--
> 1 file changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index be6d741d404c..8c41626a4c8a 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -52,6 +52,12 @@
> #define CNTV_TVAL 0x38
> #define CNTV_CTL 0x3c
>
> +/*
> + * The minimum amount of time a generic timer is guaranteed to not roll over

nit: s/timer/counter/

> + * (40 years)

For later reference, could you add the section of the ARMv8 ARM where
this is mentioned? Something like 'ARM DDI 0487G.a D11.1.2 ("The
system counter")', either here or in the comment further down.

> + */
> +#define MIN_ROLLOVER_SECS (40ULL * 365 * 24 * 3600)
> +
> static unsigned arch_timers_present __initdata;
>
> static void __iomem *arch_counter_base __ro_after_init;
> @@ -1004,9 +1010,24 @@ struct arch_timer_kvm_info *arch_timer_get_kvm_info(void)
> return &arch_timer_kvm_info;
> }
>
> +/*
> + * Makes an educated guess at a valid counter width based on the Generic Timer
> + * specification. Of note:
> + * 1) the Generic Timer is at least 56 bits wide
> + * 2) a roll-over time of not less than 40 years
> + */
> +static int __init arch_counter_get_width(void)
> +{
> + u64 min_cycles = MIN_ROLLOVER_SECS * arch_timer_get_cntfrq();

That's unfortunately wishful thinking. We have stupidly broken systems
out there that do not set CNTFRQ_EL0, and instead rely on DT
properties to describe the counter frequency. You're likely to end up
with a glorious zero as a result, with interesting consequences...

Use arch_timer_rate instead, which will be set as by the time you
register the counter.

> +
> + /* guarantee the returned width is within the valid range */
> + return max(56, min(64, ilog2(min_cycles)));

Maybe better written as "clamp_val(ilog2(min_cycles), 56, 64);".

> +}
> +
> static void __init arch_counter_register(unsigned type)
> {
> u64 start_count;
> + int width;
>
> /* Register the CP15 based counter if we have one */
> if (type & ARCH_TIMER_TYPE_CP15) {
> @@ -1031,6 +1052,10 @@ static void __init arch_counter_register(unsigned type)
> arch_timer_read_counter = arch_counter_get_cntvct_mem;
> }
>
> + width = arch_counter_get_width();
> + clocksource_counter.mask = CLOCKSOURCE_MASK(width);
> + cyclecounter.mask = CLOCKSOURCE_MASK(width);

Since you move this to be computed at runtime, how about dropping the
static initialisation of the mask fields?

> +
> if (!arch_counter_suspend_stop)
> clocksource_counter.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
> start_count = arch_timer_read_counter();
> @@ -1040,8 +1065,7 @@ static void __init arch_counter_register(unsigned type)
> timecounter_init(&arch_timer_kvm_info.timecounter,
> &cyclecounter, start_count);
>
> - /* 56 bits minimum, so we assume worst case rollover */
> - sched_clock_register(arch_timer_read_counter, 56, arch_timer_rate);
> + sched_clock_register(arch_timer_read_counter, width, arch_timer_rate);
> }
>
> static void arch_timer_stop(struct clock_event_device *clk)

Another thing that needs addressing for high frequency counter support
is to move away from TVAL programming and switch to CVAL, as the
maximum deadline we can currently program is 4.2s at 1GHz.

Fun fact: it has the interesting consequence of breaking XGene-1,
which implemented CVAL in terms of TVAL instead of the other way
around (what were these guys thinking?), though I don't think anyone
will notice in practice. I have a preliminary patch on a branch
somewhere that I'll try to dust up and post in the coming days.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.