Re: [RFC patch 15/15] LTTng timestamp x86

From: Linus Torvalds
Date: Mon Oct 20 2008 - 18:07:33 EST




On Mon, 20 Oct 2008, john stultz wrote:
>
> I'm not quite sure I followed your per-cpu xtime thoughts. Could you
> explain further your thinking as to why the entire timekeeping
> subsystem should be per-cpu instead of just keeping that back in the
> arch-specific clocksource implementation? In other words, why keep
> things synced at the nanosecond level instead of keeping the per-cpu
> TSC synched at the cycle level?

I don't think you can kep them sync'ed without taking frequency drift into
account. When you have multiple boards (ie big boxes), they simply _will_
be in different clock domains. They won't have the exact same frequency.

So the "rewrite the TSC every once in a while" approach (where "after
coming out of idle" is just a special case of "once in a while" due to
many CPU's losing TSC in idle) works well in the kind of situation where
you really only have a single clock domain, and the TSC's are all
basically from the same reference clock. And that's a common case, but it
certainly isn't the _only_ case.

What about fundamnetally different frequencies (old TSC's that change with
cpufreq)? Or what about just subtle different ones (new TSC's but on
separate sockets that use separate external clocks)?

But sure, I can imagine using a global xtime, but just local TSC offsets
and frequencies, and just generating a local offset from xtime. BUT HOW DO
YOU EXPECT TO DO THAT?

Right now, the global xtime offset thing also depends on the fact that we
have a single global TSC offset! That whole "delta against xtime" logic
depends very much on this:

/* calculate the delta since the last update_wall_time: */
cycle_delta = (cycle_now - clock->cycle_last) & clock->mask;

and that base-time setting depends on a _global_ clock source. Why?
Because it depends on setting that in sync with updating xtime.

And maybe I'm missing something. But I do not believe that it's easy to
just make the TSC be per-CPU. You need per-cpu correction factors, but you
_also_ need a per-CPU time base.

Oh, I'm sure you can do hacky things, and work around known issues, and
consider the TSC to be globally stable in a lot of common schenarios.
That's what you get by re-syncing after idle etc. And it's going to work
in a lot of situations.

But it's not going to solve the "hey, I have 512 CPU's, they are all on
different boards, and no, they are _not_ synchronized to one global
clock!".

That's why I'd suggest making _purely_ local time, and then aiming for
something NTP-like. But maybe there are better solutions out there.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/