Re: [PATCH v7] posix-timers: add clock_compare system call

From: Sagi Maimon
Date: Thu Mar 14 2024 - 08:20:05 EST


hi Thomas

On Thu, Mar 14, 2024 at 1:12 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Thu, Mar 14 2024 at 11:05, Sagi Maimon wrote:
> > Some user space applications need to read a couple of different clocks.
> > Each read requires moving from user space to kernel space.
> > Reading each clock separately (syscall) introduces extra
> > unpredictable/unmeasurable delay. Minimizing this delay contributes to user
> > space actions on these clocks (e.g. synchronization etc).
>
> I asked for a proper description of the actual problem several times now
> and you still provide some handwaving blurb. Feel free to ignore me, but
> then please don't be surprised if I ignore you too.
>
Nobody is ignoring your notes, and I address any notes given by any
maintainer in the most serious way.
As far as explaining the actual problem this is the best that I can,
but let me try to explain better:
We did many tests with different CPU loading and compared sampling the
same clock twice,
once in user space and once by using the system call.
We have noticed an improvement up to hundreds of nanoseconds while
using the system call.
Those results improved our ability to sync different PHCs

> Also why does reading two random clocks make any sense at all? Your code
> allows to read CLOCK_TAI and CLOCK_THREAD_CPUTIME_ID. What for?
>
Initially we needed to sync some different PHCs for our user space
application, that is why we came with this idea.
The first idea was an IOCTL that returned samples of several PHCs for
the need of synchronization.
Richard Cochran suggested a system call instead, which will add the
ability to get various system clocks, while this
implementation is more complex then IOCTL, I think that he was right,
for future usage.

> This is about PTP, no?
>
yes
> Again you completely fail to explain why the existing PTP ioctls are not
> sufficient for the purpose and why you need to provide a new interface
> which is completely ill defined.
>
The same as my answer above .....

> > arch/x86/entry/syscalls/syscall_64.tbl | 1 +
> > drivers/ptp/ptp_clock.c | 34 ++++--
> > include/linux/posix-clock.h | 2 +
> > include/linux/syscalls.h | 4 +
> > include/uapi/asm-generic/unistd.h | 5 +-
> > kernel/time/posix-clock.c | 25 +++++
> > kernel/time/posix-timers.c | 138 +++++++++++++++++++++++++
> > kernel/time/posix-timers.h | 2 +
>
> This needs to be split up into:
>
> 1) Infrastructure in posix-timers.c
> 2) Wire up the syscall in x86
> 3) Infrastructure in posix-clock.c
> 4) Usage in ptp_clock.c
>
> and not as a big lump of everything.
>
I know, but I think the benefit worth it
I agree that an IOCTL won't require such a big changes in the kernel code
> > +/**
> > + * clock_compare - Get couple of clocks time stamps
>
> So the name of the syscall suggest that it compares two clocks, but the
> actual functionality is to read two clocks.
>
> This does not make any sense at all. Naming matters.
>
you are right the name was suggested by Richard when the main idea was
returning the offset between the clocks.
If you have a better name, please tell me
> > + * @clock_a: clock a ID
> > + * @clock_b: clock b ID
> > + * @tp_a: Pointer to a user space timespec64 for clock a storage
> > + * @tp_b: Pointer to a user space timespec64 for clock b storage
> > + *
> > + * clock_compare gets time sample of two clocks.
> > + * Supported clocks IDs: PHC, virtual PHC and various system clocks.
> > + *
> > + * In case of PHC that supports crosstimespec and the other clock is Monotonic raw
> > + * or system time, crosstimespec will be used to synchronously capture
> > + * system/device time stamp.
> > + *
> > + * In other cases: Read clock_a twice (before, and after reading clock_b) and
> > + * average these times – to be as close as possible to the time we read clock_b.
> > + *
> > + * Returns:
> > + * 0 Success. @tp_a and @tp_b contains the time stamps
> > + * -EINVAL @clock a or b ID is not a valid clock ID
> > + * -EFAULT Copying the time stamp to @tp_a or @tp_b faulted
> > + * -EOPNOTSUPP Dynamic POSIX clock does not support crosstimespec()
> > + **/
> > +SYSCALL_DEFINE5(clock_compare, const clockid_t, clock_a, const clockid_t, clock_b,
> > + struct __kernel_timespec __user *, tp_a, struct __kernel_timespec __user *,
> > + tp_b, s64 __user *, offs_err)
> > +{
> > + struct timespec64 ts_a, ts_a1, ts_b, ts_a2;
> > + struct system_device_crosststamp xtstamp_a1, xtstamp_a2, xtstamp_b;
> > + const struct k_clock *kc_a, *kc_b;
> > + ktime_t ktime_a;
> > + s64 ts_offs_err = 0;
> > + int error = 0;
> > + bool crosstime_support_a = false;
> > + bool crosstime_support_b = false;
>
> Please read and follow the documentation provided at:
>
> https://www.kernel.org/doc/html/latest/process/maintainer-tip.html
>
> > + kc_a = clockid_to_kclock(clock_a);
> > + if (!kc_a) {
> > + error = -EINVAL;
> > + return error;
>
> What's wrong about 'return -EINVAL;' ?
>
correct will be fixed on next patch
> > + }
> > +
> > + kc_b = clockid_to_kclock(clock_b);
> > + if (!kc_b) {
> > + error = -EINVAL;
> > + return error;
> > + }
> > +
> > + // In case crosstimespec supported and b clock is Monotonic raw or system
> > + // time, synchronously capture system/device time stamp
>
> Please don't use C++ comments.
>
will be fixed on next patch
> > + if (clock_a < 0) {
>
> This is just wrong. posix-clocks ar not the only ones which have a
> negative clock id. See clockid_to_kclock()
>
> > + error = kc_a->clock_get_crosstimespec(clock_a, &xtstamp_a1);
>
> What's a crosstimespec?
>
> This function fills in a system_device_crosststamp, so why not name it
> so that the purpose of the function is obvious?
>
correct , it will be fixed on next patch
> ptp_clock::info::getcrosststamp() has a reasonable name. So why do you
> need to come up with something which makes the code obfuscated?
>
correct , it will be fixed on next patch
> > + if (!error) {
> > + if (clock_b == CLOCK_MONOTONIC_RAW) {
> > + ts_b = ktime_to_timespec64(xtstamp_a1.sys_monoraw);
> > + ts_a1 = ktime_to_timespec64(xtstamp_a1.device);
> > + goto out;
> > + } else if (clock_b == CLOCK_REALTIME) {
> > + ts_b = ktime_to_timespec64(xtstamp_a1.sys_realtime);
> > + ts_a1 = ktime_to_timespec64(xtstamp_a1.device);
> > + goto out;
> > + } else {
> > + crosstime_support_a = true;
>
> Right. If clock_b is anything else than CLOCK_MONOTONIC_RAW or
> CLOCK_REALTIME then this is true.
>
> > + }
> > + }
>
> So in case of an error, this just keeps going with an uninitialized
> xtstamp_a1 and if the clock_b part succeeds it continues and operates on
> garbage.
>
On error xtstamp_a1 will be taken again using clock_get_crosstimespec
so no one will be operating on garbage.
Please explain the problem better, because I don't see it.
> > + }
> > +
> > + // In case crosstimespec supported and a clock is Monotonic raw or system
> > + // time, synchronously capture system/device time stamp
> > + if (clock_b < 0) {
> > + // Synchronously capture system/device time stamp
> > + error = kc_b->clock_get_crosstimespec(clock_b, &xtstamp_b);
> > + if (!error) {
> > + if (clock_a == CLOCK_MONOTONIC_RAW) {
> > + ts_a1 = ktime_to_timespec64(xtstamp_b.sys_monoraw);
> > + ts_b = ktime_to_timespec64(xtstamp_b.device);
> > + goto out;
> > + } else if (clock_a == CLOCK_REALTIME) {
> > + ts_a1 = ktime_to_timespec64(xtstamp_b.sys_realtime);
> > + ts_b = ktime_to_timespec64(xtstamp_b.device);
> > + goto out;
> > + } else {
> > + crosstime_support_b = true;
> > + }
> > + }
> > + }
> > +
> > + if (crosstime_support_a)
> > + error = kc_a->clock_get_crosstimespec(clock_a, &xtstamp_a1);
>
> What? crosstime_support_a is only true when the exactly same call
> returned success. Why does it need to be called here again?
>
I wanted to take the three samples as close as possible to each other.
minimum code between the calls, that is why the call is done again.
> > + else
> > + error = kc_a->clock_get_timespec(clock_a, &ts_a1);
> > +
> > + if (error)
> > + return error;
> > +
> > + if (crosstime_support_b)
> > + error = kc_b->clock_get_crosstimespec(clock_b, &xtstamp_b);
> > + else
> > + error = kc_b->clock_get_timespec(clock_b, &ts_b);
> > +
> > + if (error)
> > + return error;
> > +
> > + if (crosstime_support_a)
> > + error = kc_a->clock_get_crosstimespec(clock_a, &xtstamp_a2);
> > + else
> > + error = kc_a->clock_get_timespec(clock_a, &ts_a2);
> > +
> > + if (error)
> > + return error;
>
> The logic and the code flow here are unreadable garbage and there are
> zero comments what this is supposed to do.
>
I will add comments.
please no need to use negative words like "garbage" (not the first time),
please keep it professional and civilized.
> > + if (crosstime_support_a) {
> > + ktime_a = ktime_sub(xtstamp_a2.device, xtstamp_a1.device);
> > + ts_offs_err = ktime_divns(ktime_a, 2);
> > + ktime_a = ktime_add_ns(xtstamp_a1.device, (u64)ts_offs_err);
> > + ts_a1 = ktime_to_timespec64(ktime_a);
>
> This is just wrong.
>
> read(a1);
> read(b);
> read(a2);
>
> You _CANNOT_ assume that (a1 + ((a2 - a1) / 2) is anywhere close to the
> point in time where 'b' is read. This code is preemtible and
> interruptible. I explained this to you before.
>
> Your explanation in the comment above the function is just wishful
> thinking.
>
you explained it before, but still it is better then two consecutive
user space calls which are also preemptible
and the userspace to kernel context switch time is added.
> > + * In other cases: Read clock_a twice (before, and after reading clock_b) and
> > + * average these times – to be as close as possible to the time we read clock_b.
>
> Can you please sit down and provide a precise technical description of
> the problem you are trying to solve and explain your proposed solution
> at the conceptual level instead of throwing out random implementations
> every few days?
>

> Thanks,
>
> tglx
>
>