Re: clock_gettime64 vdso bug on 32-bit arm, rpi-4

From: Rich Felker
Date: Tue May 19 2020 - 16:41:50 EST


On Tue, May 19, 2020 at 05:24:18PM -0300, Adhemerval Zanella wrote:
>
>
> On 19/05/2020 16:54, Arnd Bergmann wrote:
> > Jack Schmidt reported a bug for the arm32 clock_gettimeofday64 vdso call last
> > month: https://github.com/richfelker/musl-cross-make/issues/96 and
> > https://github.com/raspberrypi/linux/issues/3579
> >
> > As Will Deacon pointed out, this was never reported on the mailing list,
> > so I'll try to summarize what we know, so this can hopefully be resolved soon.
> >
> > - This happened reproducibly on Linux-5.6 on a 32-bit Raspberry Pi patched
> > kernel running on a 64-bit Raspberry Pi 4b (bcm2711) when calling
> > clock_gettime64(CLOCK_REALTIME)
>
> Does it happen with other clocks as well?
>
> >
> > - The kernel tree is at https://github.com/raspberrypi/linux/, but I could
> > see no relevant changes compared to a mainline kernel.
>
> Is this bug reproducible with mainline kernel or mainline kernel can't be
> booted on bcm2711?
>
> >
> > - From the report, I see that the returned time value is larger than the
> > expected time, by 3.4 to 14.5 million seconds in four samples, my
> > guess is that a random number gets added in at some point.
>
> What kind code are you using to reproduce it? It is threaded or issue
> clock_gettime from signal handlers?

Original report thread is here:

https://github.com/richfelker/musl-cross-make/issues/96

The reporter originally misunderstood the issue and wrongly attributed
it to difference between gettimeofday and clock_gettime but it was
just big jumps between successive vdso clock_gettime64 calls.

No transformation was being done on the output of the vdso function;
as long as it succeeds musl just returns directly with the value it
stored in the timespec. No threads or anything fancy were involved.

Current musl will no longer call it but you should be able to
dlopen("linux-gate.so.1", RTLD_NOW|RTLD_LOCAL) then use dlsym to get
its address and call it (not tested; I've never used it this way).

> > - The current musl git tree has been patched to not call clock_gettime64
> > on ARM because of this problem, so it cannot be used for reproducing it.
>
> So should glibc follow musl and remove arm clock_gettime6y4 vDSO support
> or this bug is localized to an specific kernel version running on an
> specific hardware?

For musl it was important to disable it asap pending a fix, because
users are expected to generate static binaries, and these could make
it into the wild without anyone realizing they're broken until much
later when run on an affected kernel (especially since pre-5.6 kernels
would hide the issue entirely due to lacking vdso). Ideally a fix will
be something we can detect (e.g. new symbol version) so as not to risk
calling the broken one, but whether that's necessary may depend on
what's affected.

I'm not sure if glibc should do the same; it's not often used in
static linking, and replacing libc (shared lib, or re-static-linking
which LGPL requires you to facilitate to distribute static binaries)
could solve the issue on affected systems.

Rich