Re: [RFC] Fast assurate clock readable from user space and NMI handler

From: Mathieu Desnoyers
Date: Tue Feb 27 2007 - 14:05:13 EST


* Daniel Walker (dwalker@xxxxxxxxxx) wrote:
> On Tue, 2007-02-27 at 11:02 -0500, Mathieu Desnoyers wrote:
> > * Daniel Walker (dwalker@xxxxxxxxxx) wrote:
> > > On Tue, 2007-02-27 at 02:38 -0500, Mathieu Desnoyers wrote:
> > >
> > > >
> > > > I am concerned about the automatic fallback to the PIT when no other
> > > > clock source is available. A clocksource read would be atomic when TSC
> > > > or HPET are available, but would fall back on PIT otherwise. There
> > > > should be some way to specify that a caller is only interested in atomic
> > > > clock sources (if none are available, the call should simply return an
> > > > error, or 0).
> > > >
> > > I'm not sure what you mean by using the RCU
> >
> > The original proposal of this thread uses a RCU (read-copy-update) style
> > update of the previous 64 bits counter : it swaps a pointer (atomically)
> > upon update by incrementing a word-sized counter that is used, by the
> > reader, to get the offest in the array (with a modulo operation) for the
> > current readable data and as a way to detect incorrect reads of
> > overwritten information (we re-read the word-sized counter after having
> > read the data structure to make sure is has not been incremented. If we
> > detect an increment, we redo the whole operation).
>
> I didn't see RCU at all in your original message, so I'm not sure how
> you propose to use it .. My understanding of the RCU was that it
> couldn't be used from interrupt context, that could be totally wrong so
> I'll let you explain how you planed to use it.
>

1 - I do not plan to use the rcupdate.h API, because it is oriented
towards allowing/freeing data structures after a quiescent state. I
don't need that. I only want to have a 64 bits data structure valid for
reading, with atomic update. Therefore, I keep an array of 2 64 bits
structures. At all time, there is one used as "readable" value and the other
as "writeable". The role is exchanged at each update. The word-sized
counter is used to select the current read and write pointers through a
mask, and is also used to detect bad reads (is a read is preempted, and
then we have 2 updates, the reader could read a bad value without
knowing it). By keeping a word-sized counter of the number of updates,
we have 32 (or 64) bits (depending on the architecture) before the wrap
around, which should not happen even in a far future.



> > > > I still think that an RCU style update mechanism would be a good way to
> > > > fix the current clocksource read issue. Another, slower and non NMI
> > > > safe way to do this would be with a read seqlock and with IRQ disabling.
> > >
> > > , but the pit clocksource
> > > does disable interrupts with a spin_lock_irqsave().
> > >
> >
> > When I say "clocksource read issue", I am talking about
> > race between the function you proposed earlier, which you say is used in
> > -rt kernels for latency tracing (get_monotonic_cycles), and HPET and TSC
> > "last cycles" updates.
>
> Right .. You said that regular interrupts would cause this non-atomic
> 64-bit update race , but the pit disabled interrupts, and the
> last_cycles update is done with interrupts off .. So I think we're back
> to only the NMI case ..
>
> Did you have another scenario ?
>

__get_nsec_offset : reads clock->cycle_last. Should be called with
xtime_lock held. (ok so far, but see below)

change_clocksource
clock->cycle_last = now; (non atomic 64 bits update. Not protected by
any lock ?) -> this would race with __get_nsec_offset ?

update_wall_time
Called from timer interrupt. Holds xtime_lock and has a priority higher
than other interrupts. Other clock->cycle_last protected by
write_seqlock_irqsave.

get_monotonic_cycles (as you proposed, in -rt kernels) :
reads clock->cycle_last. Not protected by any read seqlock and does not
disable interrupts. Races with change_clocksource, update_wall_time and
all other time update functions. For instance, is someone uses
get_monotonic_cycles in process context and the timer interrupt fires
update_wall_time right at the middle of the 2 32 bits read, the value
will be wrong.

Mathieu

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/