Re: [RFC patch 15/15] LTTng timestamp x86

From: Mathieu Desnoyers
Date: Wed Oct 22 2008 - 12:20:04 EST


* Ingo Molnar (mingo@xxxxxxx) wrote:
>
> * Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > And if you make all these linear interpolations be per-CPU (so you
> > have per-CPU offsets and frequencies) you never _ever_ need to touch
> > any shared data at all, and you know you can scale basically
> > perfectly.
> >
> > Your linear interpolations may not be _perfect_, but you'll be able to
> > get them pretty damn near. In fact, even if the TSC's aren't
> > synchronized at all, if they are at least _individually_ stable (just
> > running at slightly different frequencies because they are in
> > different clock domains, and/or at different start points), you can
> > basically perfect the precision over time.
>
> there's been code submitted by Michael Davidson recently that looked
> interesting, which turns the TSC into such an entity:
>
> http://lkml.org/lkml/2008/9/25/451
>
> The periodic synchronization uses the hpet, but it thus allows lockless
> and globally correct readouts of the TSC .
>
> And that would match the long term goal as well: the hw should do this
> all automatically. So perhaps we should have a trace_clock() after all,
> independent of sched_clock(), and derived straight from RDTSC.
>
> The approach as propoed has a couple of practical problems, but if we
> could be one RDTSC+multiplication away from a pretty good timestamp that
> would be rather useful, very fast and very robust ...
>
> Ingo

Looking at this code, I wonder :

- How it would support virtualization.
- How it would scale to 512 nodes, if we consider that every idle node
is doing an HPET readl each time it exits from safe_halt() (this can
end up taking most of the HPET timer bandwidth). So in the case where
we have 256 idle nodes taking all the HPET timer bandwidth and a 256
nodes doing useful work, the time these HPET reads can take on the
useful nodes when they try to resync with the HPET could be long (they
may need to sample it periodically or at CPU frequency change, or they
may simply go idle once in a while). We might end up having difficulty
getting a CPU out of idle due to the time it takes simply to get hold
of the HPET.

Given the bad scalability numbers I've recently posted for the HPET, I
doubt this a workable solution to the scalability issue.

Mathieu


--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/