RE: [PATCH] x86: Export tsc related information in sysfs

From: Dan Magenheimer
Date: Sat May 15 2010 - 18:35:23 EST


> From: Arjan van de Ven [mailto:arjan@xxxxxxxxxxxxx]
(Arjan comments reordered somewhat)

> But friends don't let friends use rdtsc in application code.

Um, I realize that many people have been burned by this
many times over the years so it is a "hot stove". I also
realize that there are many environments where using
rdtsc is risking stepping on landmines. But I (we?) also
know there are many environments now where using rdtsc is
NOT risky at all... and with the vast majority of new
systems soon shipping with Invariant TSC and a single socket
(and even most multiple-socket systems with non-broken
BIOSes passing a warp test), why should past burns outlaw
userland use of a very fast, very useful CPU feature? After
all, CPU designers at both Intel and AMD have spent
a great deal of design effort and transistors to FINALLY
provide an Invariant TSC.

> > The problem is from an app point-of-view there is no vsyscall.
> > There are two syscalls: gettimeofday and clock_gettime. Sometimes,
> > if it gets lucky, they turn out to be very fast and sometimes
> > it doesn't get lucky and they are VERY slow (resulting in a
> > performance hit of 10% or more), depending on a number of factors
> > completely out of the control of the app and even undetectable to the
> > app.
>
> But the point is.. in the case you get that 10% hit.... that is exactly
> the case where tsc would not work either!!!

Yes, understood. But the kernel doesn't expose a "gettimeofday
performance sucks" flag either. If it did (or in the case of
the patch, if tsc_reliable is zero) the application could at least
choose to turn off the 10000-100000 timestamps/second and log
a message saying "you are running on old hardware so you get
fewer features".

> just when we're trying to get rid of this constraint by allowing a per
> cpu offset... (this is needed to cope with cpus not powering on at the
> exact same time... including hotplug cpu etc etc)
>
> oh and.. what notification mechanism do you have to notify the
> application that the tsc now is no longer reliable? Such conditions
> can exist... for example due to a CPU being hotplugged, or some SMM
> screwing around and the kernel detecting that or .. or ...

The proposal doesn't provide a notification mechanism (though I'm
not against it)... if the tsc can EVER become unreliable,
tsc_reliable should be 0.

A CPU-hotplugable system is a good example of a case where
the kernel should expose that tsc_reliable is 0. (I've heard
anecdotally that CPU hotplug into a QPI or Hypertransport system
will have some other interesting challenges, so may require some
special kernel parameters anyway.) Even if tsc_reliable were
only enabled if a "no-cpu_hotplug" kernel parameter is set,
that is still useful. And with cores-per-socket (and even
nodes-per-socket) going up seemingly every day, multi-socket
systems will likely be an ever smaller percentage of new
systems.

A virtual machine where live migration to another physical machine
may occur is another good example where tsc_reliable should be 0.
Xen now has a VM config feature that says "migration is disallowed"
for this reason; the Invariant TSC flag is always off for a VM
unless this "no_migrate" flag is set (or rdtsc is emulated).

> really. Use the vsyscall. If the vsyscall does not do exactly what you
> want, make a better vsyscall.

If this discussion results in a better vsyscall and/or a way
for applications to easily determine (and report loudly) that
the system does NOT provide a good way to do a fast timestamp,
that may be sufficient. But please propose how that will be done
as the current software choices are inadequate and the CPU
designers have finally fixed the problem for the vast majority
of systems. I am already aware of some enterprise software
that is doing its best to guess whether TSC is reliable by
looking at CPU families and socket counts, but this is doomed
to failure in userland and is something that the kernel knows
and should now expose.

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/