Re: [RFC patch 15/15] LTTng timestamp x86

From: Mathieu Desnoyers
Date: Wed Oct 22 2008 - 13:05:32 EST


* Linus Torvalds (torvalds@xxxxxxxxxxxxxxxxxxxx) wrote:
>
>
> On Fri, 17 Oct 2008, Mathieu Desnoyers wrote:
> >
> > Hrm, on such systems
> > - *large* amount of cpus
> > - no synchronized TSCs
> >
> > What would be the best approach to order events ?
>
> My strong opinion has been - for a longish while now, and independently of
> any timestamping code - that we should be seriously looking at basically
> doing essentially a "ntp" inside the kernel to give up the whole idiotic
> notion of "synchronized TSCs". Yes, TSC's are often synchronized, but even
> when they are, we might as well _think_ of them as not being so.
>
> In other words, instead of expecting internal clocks to be synchronized,
> just make the clock be a clock network of independent TSC domains. The
> domains could in theory be per-package (assuming TSC is synchronized at
> that level), but even if we _could_ do that, we'd probably still be better
> off by simply always doing it per-core. If only because then the reading
> would be per-core.
>
> I think it's a mistake for us to maintain a single clock for
> gettimeofday() (well, "getnstimeofday" and the whole "clocksource_read()"
> crud to be technically correct). And sure, I bet clocksource_read() can do
> various per-CPU things and try to do that, but it's complex and pretty
> generic code, and as far as I know none of the clocksources have even
> tried. The TSC clocksource read certainly does not (it just does a very
> similar horrible "at least don't go backwards" crud that the LTTng patch
> suggested).
>
> So I think we should make "xtime" be a per-CPU thing, and add support for
> per-CPU clocksources. And screw that insane "mark_tsc_unstable()" thing.
>
> And if we did it well, we migth be able to get good timestamps that way
> too.
>
> Linus

Yep, it looks like a promising area to look into. I think, however, that
it would be good to first experiment with it as a in-kernel time source
rather than as a tracing time source, so we can use a tracer to make
sure it is stable enough. :-)

Also, we have to wonder if it's worth side-stepping tracing developement
on what I consider being a "special-case for buggy hardware". If we let
development on this specific problem at the kernel level go on its own
and decide to use it for tracing when it's judged good enough, we
(tracing people) can focus on the following steps needed to get a tracer
into Linux, namely buffering, event id management, etc. Given I feel the
need for tracing is relatively urgent for the community, I'd recommend
getting a basic, non-perfect timestamping solution in first, and keep
room for improvement.

I prefer to provide tracing for 98% of the machines out there and point
to some documentation telling how to configure the other 1.95% (and feel
sorry for the people how fall in the inevitable 0.05%) than to spend
years trying to come up with a complex scheme aiming precisely at this
1.95%.

Mathieu


--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/