Re: [RFC patch 15/15] LTTng timestamp x86

From: Steven Rostedt
Date: Fri Oct 17 2008 - 13:25:31 EST

On Thu, Oct 16, 2008 at 07:19:48PM -0700, Luck, Tony wrote:
> > This cache-line bouncing global clock is a best-effort to provide
> > correct event order in the trace on architectures with unsync tsc. It's
> > actually better than a global tracing buffer because it limits the
> > number of cache line transfers required to one per event.
> Even one line bouncing between cpus can be a performamce disaster.
> You'll probably hit a serious wall somewhere between 8 and 16
> cpus (ia64 has code that looks a lot like this in the gettimeofday()
> path because it does not synchronize cpu cycle counters ... some
> applications that are overly fond of timestamping internal
> events using gettimeofday() end up spending significant time
> doing so on large systems ... even with only a few thousands
> of calls per second).

I agree that one cache line bouncer is devastating to performance. But
as Mathieu said, it is better than a global tracer with lots of bouncing
going on. My logdev tracer (something similar to ftrace, but used only
for debugging) use to have a single buffer. By moving it to a per cpu
buffer and using an atomic counter to sort the events, the increase of
speed was a few magnitudes.

ftrace does not have a global counter, but on some boxes with out of
sync TSCs, it could not find race conditions. I had to pull in logdev,
which found the race right away, because of this atomic counter.

logdev adds a bit of perfomance degradation, but for debugging, I don't
care, and it has helped me quite a bit.

ftrace can help in debugging most of the time, but on some boxes with
wacky time stamps, it is useless to find race problems between CPUS. But
ftrace is for production, and can not afford the performance penalty of
a global counter.

-- Steve

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at