Re: [PATCH] rtc: adapt allowed RTC update error

From: Thomas Gleixner
Date: Thu Dec 03 2020 - 16:06:01 EST


On Thu, Dec 03 2020 at 12:16, Jason Gunthorpe wrote:
> On Thu, Dec 03, 2020 at 04:39:21PM +0100, Thomas Gleixner wrote:
>
>> The logic in sync_cmos_clock() and rtc_set_ntp_time() is different as I
>> pointed out: sync_cmos_clock() hands -500ms to rtc_tv_nsec_ok() and
>> rtc_set_ntp_time() uses +500ms, IOW exactly ONE second difference in
>> behaviour.
>
> I understood this is because the two APIs work differently, rmk
> explained this as:
>
>> 1. kernel/time/ntp.c assumes that all RTCs want to be told to set the
>> time at around 500ms into the second.
>>
>> 2. drivers/rtc/systohc.c assumes that if the time being set is >= 500ms,
>> then we want to set the _next_ second.
>
> ie one path is supposed to round down and one path is supposed to
> round up, so you get to that 1s difference..
>
> IIRC this is also connected to why the offset is signed..

The problem is that it is device specific and therefore having the
offset parameter is a good starting point.

Lets look at the two scenarios:

1) Direct accessible RTC:

tsched t1 t2
write(newsec) RTC increments seconds

For rtc_cmos/MC1... tinc = t2 - t1 = 500ms

There are RTCs which reset the thing on write so tinc = t2 - t1 = 1000ms

No idea what other variants are out there, but the principle is the
same for all of them.

Lets assume that the event is accurate for now and ignore the fuzz
logic, i.e. tsched == t1

tsched must be scheduled to happen tinc before wallclock increments
seconds so that the RTC increments seconds at the same time.

That means newsec = t1.tv_sec.

So now the fuzz logic for the legacy cmos path does:

newtime = t1 - tinc;

if (newtime.tv_nsec < FUZZ)
newsec = newtime.tv_sec;
else if (newtime.tv_nsec > NSEC_PER_SEC - FUZZ)
newsec = newtime.tv_sec + 1;
else
goto fail;

The first condition handles the case where t1 >= tsched and the second
one where t1 < tsched.

We need the same logic for rtc_cmos() when the update goes through
the RTC path, which is broken today. See below.

2) I2C/SPI ...

tsched t0 t1 t2
transfer(newsec) RTC update (newsec) RTC increments seconds

Lets assume that ttransfer = t1 - t0 is known.

tinc is the same as above = t2 - t1

Again, lets assume that the event is accurate for now and ignore the fuzz
logic, i.e. tsched == t0

So tsched has to be ttot = t2 - t0 _before_ wallclock reaches t2 and
increments seconds.

In this case newsec = t1.tv_sec = (t0 + ttransfer).tv_sec

So now the fuzz logic for this is:

newtime = t0 + ttransfer;

if (newtime.tv_nsec < FUZZ)
newsec = newtime.tv_sec;
else if (newtime.tv_nsec > NSEC_PER_SEC - FUZZ)
newsec = newtime.tv_sec + 1;
else
goto fail;

Again the first condition handles the case where t1 >= tsched and the
second one where t1 < tsched.

So now we have two options to fix this:

1) Use a negative sync_offset for devices which need #1 above
(rtc_cmos & similar)

That requires setting tsched to t2 - abs(sync_offset)

2) Use always a positive sync_offset and a flag which tells
rtc_tv_nsec_ok() whether it needs to add or subtract.

#1 is good enough. All it takes is a comment at the timer start code why
abs() is required.

Let me hack that up along with the hrtimer muck.

Thanks,

tglx