clock_gettime64 vdso bug on 32-bit arm, rpi-4

From: Arnd Bergmann
Date: Tue May 19 2020 - 15:54:24 EST


Jack Schmidt reported a bug for the arm32 clock_gettimeofday64 vdso call last
month: https://github.com/richfelker/musl-cross-make/issues/96 and
https://github.com/raspberrypi/linux/issues/3579

As Will Deacon pointed out, this was never reported on the mailing list,
so I'll try to summarize what we know, so this can hopefully be resolved soon.

- This happened reproducibly on Linux-5.6 on a 32-bit Raspberry Pi patched
kernel running on a 64-bit Raspberry Pi 4b (bcm2711) when calling
clock_gettime64(CLOCK_REALTIME)

- The kernel tree is at https://github.com/raspberrypi/linux/, but I could
see no relevant changes compared to a mainline kernel.

- From the report, I see that the returned time value is larger than the
expected time, by 3.4 to 14.5 million seconds in four samples, my
guess is that a random number gets added in at some point.

- From other sources, I found that the Raspberry Pi clocksource runs
at 54 MHz, with a mask value of 0xffffffffffffff. From these numbers
I would expect that reading a completely random hardware register
value would result in an offset up to 1.33 billion seconds, which is
around factor 100 more than the error we see, though similar.

- The test case calls the musl clock_gettime() function, which falls back to
the clock_gettime64() syscall on kernels prior to 5.5, or to the 32-bit
clock_gettime() prior to Linux-5.1. As reported in the bug, Linux-4.19 does
not show the bug.

- The behavior was not reproduced on the same user space in qemu,
though I cannot tell whether the exact same kernel binary was used.

- glibc-2.31 calls the same clock_gettime64() vdso function on arm to
implement clock_gettime(), but earlier versions did not. I have not
seen any reports of this bug, which could be explained by users
generally being on older versions.

- As far as I can tell, there are no reports of this bug from other users,
and so far nobody could reproduce it.

- The current musl git tree has been patched to not call clock_gettime64
on ARM because of this problem, so it cannot be used for reproducing it.

If anyone has other information that may help figure out what is going
on, please share.

Arnd