Re: [PATCH] FS: timerfd: Fix unexpected return value of timerfd_read function.

From: Thomas Gleixner
Date: Mon Aug 19 2019 - 14:30:01 EST


Arul,

On Mon, 19 Aug 2019, Arul Jeniston wrote:

> > hits ktime_get() or whether it happens concurrent on a different CPU.
> > ktime_get() can never use inconsistent tk data for calculating the time.
>
> Agreed. I think, I am not making my point clear here.
> Do you mean to say ktime_get() would always return incremental time
> irrespective of isr and multi-processors?

Yes. The only exception is when the TSC is either jumping or not fully in
sync between cores.

> If yes, this is where, I have difference of understanding.

And your understanding is still wrong. I explain it to you _once_ more:

The side which updates the timekeeper:

- increments the sequence count _BEFORE_ it changes any data. After that
increment the sequence count is odd, i.e. bit 0 is set.

- updates data (base, last, mult, shift ....)

- increments it again _AFTER_ it updated data. After that increment
the sequence count is even, i.e. bit 0 is cleared.

The read out side:

start:
- reads the sequence count
- if sequence count is odd (update in progress) go back to start

- reads base from timekeeper data
- reads TSC and calculates the delta with timekeeper data
(last, mult, shift ...), i.e. timekeeping_get_ns().

- reads the sequence count again.

If it is even and the same as read above, the data is valid and
consistent and the result is returned.

If the sequence count is different to the original value it goes back
to start.

It does not matter at all if timekeeping_get_ns() returns occasionally a
wrong value due to timekeeper data being updated concurrently. The result
is discarded and never returned to the caller. It tries again.

All places which update the timer keeper issue the sequence count increment
protection and are properly serialized against each other. So there is no
occacional point where ktime_get() would return random crap due to being
interrupted by an update or due to a concurrent update on a different CPU.

This is a protection mechanism which is well understood in computer science
(seqlock with lockless readers) and it works in kernel timekeeping for way
more than a decade without any issue except when the underlying hardware
clocksource (TSC in that case) misbehaves. There is no way to protect the
code against this and we are not going to do anything about it simply
because we can't.

The fact that you can observe the (cycles < last) condition is not proving
anything. Just looking at the (cycles < last) condition is wrong. You need
to proof that the result is returned from ktime_get() without a retry
despite the sequence counter being changed. I doubt you can.

If you can prove that the condition is met _AND_ the sequence counter has
NOT changed, then you have proven that the TSC on your machine is not
correctly synchronized or otherwise returning crap values.

You can make up further weird theories about the incorrectness of that
code, but these theories wont become magically true.

Thanks,

tglx