Re: Re: [PATCH v2 2/2] KVM: x86: use x86_get_freq to get freq for kvmclock

From: Maxim Levitsky
Date: Fri Dec 03 2021 - 02:55:14 EST


On Thu, 2021-12-02 at 23:45 +0100, Peter Zijlstra wrote:
> On Thu, Dec 02, 2021 at 09:19:25AM +0200, Maxim Levitsky wrote:
> > On Thu, 2021-12-02 at 13:26 +0800, zhenwei pi wrote:
> > Note that on my Zen2 machine (3970X), aperf/mperf returns current cpu freqency,
>
> Correct, and it computes it over a random period of history. IOW, it's a
> random number generator.
>
> > 1. It sucks that on AMD, the TSC frequency is calibrated from other
> > clocksources like PIT/HPET, since the result is not exact and varies
> > from boot to boot.
>
> CPUID.15h is supposed to tell us the actual frequency; except even Intel
> find it very hard to actually put the right (or any, really) number in
> there :/ Bribe your friendly AMD engineer with beers or something.

That what I thought. I asked just in case maybe AMD does have some vendor specific msrs
you know about but we didn't bother to support it.
I didn't find any in their PRM.

>
> > 2. In the guest on AMD, we mark the TSC as unsynchronized always due to the code
> > in unsynchronized_tsc, unless invariant tsc is used in guest cpuid,
> > which is IMHO not fair to AMD as we don't do this for Intel cpus.
> > (look at unsynchronized_tsc function)
>
> Possibly we could treat >= Zen similar to Intel there. Also that comment
> there is hillarious, it talks about multi-socket and then tests
> num_possible_cpus(). Clearly that code hasn't been touched in like
> forever.
Thank you!

>
> > 3. I wish the kernel would export the tsc frequency it found to userspace
> > somewhere in /sys or /proc, as this would be very useful for userspace applications.
> > Currently it can only be found in dmesg if I am not mistaken..
> > I don't mind if such frequency would only be exported if the TSC is stable,
> > always running, not affected by CPUfreq, etc.
>
> Perf exposes it, it's not really convenient if you're not using perf,
> but it can be found there.
That is good to know! I will check out the source but if you remember,
is there cli option in perf to show it, or it only uses it for internal
purposes?

>
>
> ---
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 2e076a459a0c..09da2935534a 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -29,6 +29,7 @@
> #include <asm/intel-family.h>
> #include <asm/i8259.h>
> #include <asm/uv/uv.h>
> +#include <asm/topology.h>
>
> unsigned int __read_mostly cpu_khz; /* TSC clocks / usec, not used here */
> EXPORT_SYMBOL(cpu_khz);
> @@ -1221,9 +1222,20 @@ int unsynchronized_tsc(void)
> * Intel systems are normally all synchronized.
> * Exceptions must mark TSC as unstable:
> */
> - if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) {
> + switch (boot_cpu_data.x86_vendor) {
> + case X86_VENDOR_INTEL:
> + /* Really only Core and later */
> + break;
> +
> + case X86_VENDOR_AMD:
> + case X86_VENDOR_HYGON:
> + if (boot_cpu_data.x86 >= 0x17) /* >= Zen */
> + break;
> + fallthrough;
> +
> + default:
> /* assume multi socket systems are not synchronized: */
> - if (num_possible_cpus() > 1)
> + if (topology_max_packages() > 1)
> return 1;
> }
>

This makes sense!

>


Best regards,
Maxim Levitsky