Re: [PATCH] arm64: arch_timer: XGene-1 has 31 bit, not 32 bit, arch timer.

From: Marc Zyngier
Date: Sat Oct 22 2022 - 06:42:32 EST


Hi Joe,

On Fri, 21 Oct 2022 20:47:46 +0100,
Joe Korty <joe.korty@xxxxxxxxxxxxxxxxx> wrote:
>
> Hi Marc,
>
> On Fri, Oct 21, 2022 at 07:08:50PM +0100, Marc Zyngier wrote:
> > Sorry, but you'll have to provide a bit more of an analysis here. As
> > far as I can tell, you're just changing a parameter without properly
> > describing what breaks and how.
>
> There isn't much to analyse.

Actually, there is plenty to analyse. Starting with *why* 31 is the
correct value (it actually is, see below) other than "hey, I reverted
this and it's all good, just merge it".

> For ages, 0x7fffffff (31 bits) was the
> declared width of 'arch timer' for all arm architures, and that worked.
> Your patch series made the declared width vary according to which chipset
> was in use, which is good, but that rewrite changed the above mask for
> the XGene-1 from 0x7fffffff to 0xffffffff.

This isn't quite what my changes did, but hey, let's not get derailed.

> That change broke timers
> for the XGene-1 since it seems that, in actuality, it has only a 31 bit
> wide arch timer. Thus declaring that arch timer has 32-bits is wrong.
> This mismatch between the actual and declared sizes would cause arithmetic
> errors in the calculation of timer deltas which more than accounts for
> the hrtimer failures I am seeing when running 5.16+ on my Mustang XGene1.

This is the important point, and the reason why it breaks:

XGene implements CVAL (a 64bit comparator) in terms of TVAL (a
countdown register) instead of the other way around. TVAL being a
32bit register, the width of the counter should equally be 32.
However, TVAL is a *signed* value, and keeps counting down in the
negative range once the timer fires.

It means that any TVAL value with bit 31 set will fire immediately, as
it cannot be distinguished from an already expired timer. Reducing the
timer range back to a paltry 31 bits papers over the issue.

Another problem cannot be fixed though, which is that the timer
interrupt *must* be handled within the negative countdown period, or
the interrupt will be lost (TVAL will rollover to a positive value,
indicative of a new timer deadline).

> Only one line need change, the rest are fluff:
>
> - return CLOCKSOURCE_MASK(32);
> + return CLOCKSOURCE_MASK(31);

Yes, and all you need is to send a proper patch, see below.

>
> > Also, this isn't much of a patch.
>
> I don't know what this means. The patch contains all that is needed for
> the fix, no more. I could add more comments as to _why_ it is 31 bits
> not 32, but I don't know why. I only know that the motherboard behaves
> as if 31 bits is all that is available in the hardware.
>
> > Please see the documentation on how to properly submit one.
>
> AFAICS, the only submission mistake is that the 'Cc: stable@xxxxxxxxxxxxxxx'
> line is missing.

What you have done here is to write an email with a diff appended to
it, which isn't a proper kernel patch. I expect a patch to be
formatted with "git format-patch" instead of "git diff"
(i.e. something that is an actually commit instead of a local diff),
with a proper commit message (feel free to nick some of the
description above), with a Cc: stable@ and a Fixes: tag at the right
spot, Cc'ing all the relevant maintainers.

All of this is eloquently explained in the kernel documentation
(Documentation/process/submitting-patches.rst), and I would definitely
encourage you to read the sections titled "Describe your changes" and
"The canonical patch format". You can also look at the previous
commits to the same file to get a sense of the formatting that people
use.

Thanks,

M.

--
Without deviation from the norm, progress is not possible.