Re: [PATCH 3/5] net: fec: initialize clock with 0 rather than current kernel time

From: Vladimir Oltean
Date: Tue Jul 07 2020 - 12:43:42 EST


On Tue, Jul 07, 2020 at 07:07:08PM +0300, Sergey Organov wrote:
> Vladimir Oltean <olteanv@xxxxxxxxx> writes:
> >
> > What do you mean 'no ticking', and what do you mean by 'non-initialized
> > clock' exactly? I don't know if the fec driver is special in any way, do
> > you mean that multiple runs of $(phc_ctl /dev/ptp0 get) from user space
> > all return 0? That is not at all what is to be expected, I think. The
> > PHC is always ticking. Its time is increasing.
>
> That's how it is right now. My point is that it likely shouldn't. Why is
> it ticking when nobody needs it? Does it draw more power due to that?
>
> > What would be that initialization procedure that makes it tick, and
> > who is doing it (and when)?
>
> The user space code that cares, obviously. Most probably some PTP stack
> daemon. I'd say that any set clock time ioctl() should start the clock,
> or yet another ioctl() that enables/disables the clock, whatever.
>

That ioctl doesn't exist, at least not in PTP land. This also addresses
your previous point.

> >
> >> > Whatever the default value of the clock may be, it's bound to be
> >> > confusing for some reason, _if_ the reason why you're investigating it
> >> > in the first place is a driver bug. Also, I don't really see how your
> >> > change to use Jan 1st 1970 makes it any less confusing.
> >>
> >> When I print the clocks in application, I see seconds and milliseconds
> >> part since epoch. With this patch seconds count from 0, that simply
> >> match uptime. Easy to tell from any other (malfunctioning) clock.
> >>
> >
> > It doesn't really match uptime (CLOCK_MONOTONIC). Instead, it is just
> > initialized with zero. If you have fec built as module and you insmod it
> > after a few days of uptime, it will not track CLOCK_MONOTONIC at all.
> >
> > Not to say that there's anything wrong with initializing it with 0. It's
> > just that I don't see why it would be objectively better.
>
> Well, it would have been better for me in my particular quest to find
> the problem, so it rather needs to be shown where initializing with
> kernel time is objectively better.
>
> Moreover, everything else being equal, 0 is always better, just because
> of simplicity.
>
> >
> >> Here is the description of confusion and improvement. I spent half a day
> >> not realizing that I sometimes get timestamps from the wrong PTP clock.
> >
> > There is a suite of tests in tools/testing/selftests/ptp/ which is
> > useful in debugging problems like this.
> >
> > Alternatively, you can write to each individual clock using $(phc_ctl
> > /dev/ptpN set 0) and check your timestamps again. If the timestamps
> > don't nudge, it's clear that the timestamps you're getting are not from
> > the PHC you've written to. Much simpler.
>
> Maybe. Once you do figure there is another clock in the system and/or
> that that clock is offending. In my case /that/ was the hard part, not
> changing that offending clock, once found, to whatever.
>

And my point was that you could have been in a different situation, when
all of your clocks could have been ticking in 1970, so this wouldn't
have been a distiguishing point. So this argument is poor. Using
phc_ctl, or scripts around that, is much more dynamic.

> >
> >> Part of the problem is that kernel time at startup, when it is used for
> >> initialization of the PTP clock, is in fact somewhat random, and it
> >> could be off by a few seconds.
> >
> > Yes, the kernel time at startup is exactly random (not traceable to any
> > clock reference). And so is the PHC.
> >
> >> Now, when in application I get time stamp
> >> that is almost right, and then another one that is, say, 9 seconds off,
> >> what should I think? Right, that I drive PTP clock wrongly.
> >>
> >> Now, when one of those timestamps is almost 0, I see immediately I got
> >> time from wrong PTP clock, rather than wrong time from correct PTP
> >> clock.
> >>
> >
> > There are 2 points to be made here:
> >
> > 1. There are simpler ways to debug your issue than to leave a patch in
> > the kernel, like the "phc_ctl set 0" I mentioned above. This can be
> > considered a debugging patch which is also going to have consequences
> > for the other users of the driver, if applied. We need to consider
> > whether the change in behavior is useful in general.
>
> This does not apply to my particular case as I explained above, and then
> ease with debug is just a nice side-effect of code simplification.
>
> >
> > 2. There are boards out there which don't have any battery-backed RTC,
> > so CLOCK_REALTIME could be ticking in Jan 1970 already, and therefore
> > the PHC would be initialized with a time in 1970. Or your GM might be
> > configured to be ticking in Jan 1970 (there are some applications
> > that only require the network to be synchronized, but not for the
> > time to be traceable to TAI). How does your change make a difference
> > to eliminate confusion there, when all of your clocks are going to be
> > in 1970? It doesn't make a net difference. Bottom line, a clock
> > initialized with 0 doesn't mean it's special in any way. You _could_
> > make that change in your debugging environment, and it _could_ be
> > useful to your debugging, but if it's not universally useful, I
> > wouldn't try to patch the kernel with this change.
>
> If there is nothing special about any value, 0 is the value to choose,
> because of simplicity. Once again, I only explained debugging advantages
> because you've asked about it. It's just a nice side-effect, as it
> often happens to be when one keeps things as simple as possible.
>
> > Please note that, although my comments appear to be in disagreement with
> > your idea, they are in fact not at all. It's just that, if there's a a
> > particular answer to "what time to initialize a PHC with" that is more
> > favourable than the rest (even though the question itself is a bit
> > irrelevant overall), then that answer ought to be enforced kernel-wide,
> > I think.
>
> As everybody, I believe in a set of generic programming principles that
> are not to be violated lightly. KISS is one of the principles I believe,
> and trying to be clever with no apparent reason is one way of violating
> it.
>
> Overall, here is my argument: 0 is simpler than kernel time, so how is
> it useful to initialize PTP with kernel time that is as wrong as a value
> for PTP time as 0?
>

And overall, my argument is: you are making a user-visible change, for
basically no strong reason, other than the fact that you like zero
better. You're trying to reduce confusion, not increase it, right?

I agree with the basic fact that zero is a simpler and more consistent
value to initialize a PHC with, than the system time. As I've already
shown to you, I even attempted to make a similar change to the ptp_qoriq
driver which was rejected. So I hoped that you could bring some better
arguments than "I believe 0 is simpler". Since no value is right, no
value is wrong either, so why make a change in the first place? The only
value in _changing_ to zero would be if all drivers were changed to use
it consistently, IMO.

But I will stop here and let the PTP maintainer make a choice. I only
intervened because I knew what the default answer was going to be.

> Thanks,
> -- Sergey.

Thanks,
-Vladimir