Re: [RFC] Fast assurate clock readable from user space and NMI handler

From: Mathieu Desnoyers
Date: Mon Feb 26 2007 - 22:55:23 EST


* Daniel Walker (dwalker@xxxxxxxxxx) wrote:
> On Mon, 2007-02-26 at 17:14 -0500, Mathieu Desnoyers wrote:
>
>
> > For kernel and user space tracing, those small jumps are very annoying :
> > it can show, in a trace, that a fork() appears on a CPU after the first
> > schedule() of the thread on the other CPU : scheduling causality relationship
> > can become very hard to follow. This is only a sample case. Inaccuracy and
> > periodical modification of the clock time (non monotonic) can cause important
> > inaccuracy in performance tests, even on UP systems. A monotonic clock,
> > accessible from anywhere in kernel space (including NMI handler) and
> > from user space is very useful for performance analysis and, more
> > generally, for timestamping data in per cpu buffers so it can be later
> > reordered correctly.
>
> What about adding a layer below do_gettimeofday() which just scheds the
> adjustment process? That might be reasonable .. The NMI, and userspace
> cases aren't very compelling right now, at least I'm not convinced a
> whole new timing interface is needed ..
>
> The latency tracing system in the -rt branch modifies the gettimeofday
> facilities , I'm not sure of the correctness of it but it gets called
> from anyplace in the kernel including NMI's .
>
> Here's the function,
>
> cycle_t notrace get_monotonic_cycles(void)
> {
> cycle_t cycle_now, cycle_delta;
>
> /* read clocksource: */
> cycle_now = clocksource_read(clock);
>
> /* calculate the delta since the last update_wall_time: */
> cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;
>
> return clock->cycle_last + cycle_delta;
> }
>
> That looks safe. When converting this to nanoseconds you would still get
> the time adjustments but it would be all at once instead of in little
> increments ..
>

ouch... if the clocksource used is the PIT on x86 :

static cycle_t pit_read(void)
{
unsigned long flags;
int count;
u32 jifs;
static int old_count;
static u32 old_jifs;

spin_lock_irqsave(&i8253_lock, flags);

If an NMI nests over the spinlock, we have a deadlock.

In addition, clock->cycle_last is a cycle_t, defined as a 64 bits on
x86. If is therefore not updated atomically by change_clocksource,
timekeeping_init, timekeeping_resume and update_wall_time. If an NMI
fires right on top of the update, especially around the 32 bits wrap
around, the time will be really fuzzy.

Mathieu

> Daniel
>

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/