Re: [PATCH 0/4] x86/platform/UV: Update TSC support

From: Peter Zijlstra
Date: Fri Sep 29 2017 - 12:24:02 EST


On Fri, Sep 29, 2017 at 08:19:22AM -0700, Mike Travis wrote:
> > So I would still like to get clarification on how ART works (or likely
> > doesn't) on your systems. I think for now its fairly prudent to kill
> > detect_art() on UV.
>
> I tested with both detect_art enabled and disabled and didn't notice a
> difference though I wasn't sure what test to run to verify whether it was
> being used or not. (I'd be glad to run some specific test if one can be
> suggested?) The num/denom setting for a 2100Mhz CPU was 168/2 if that
> information helps?

While ART has a ratio to TSC, it too has an absolute relation to it.
Given an ART time stamp we can compute a TSC value and vice versa, this
allows correlating device timestamps (Network, Audio/Video etc..) with
CPU time stamps.

Per detect_art() we have a single system wide offset, namely:

rdmsrl(MSR_IA32_TSC_ADJUST, art_to_tsc_offset);

But you use TSC_ADJUST to sync between your cabinets, this cannot ever
be right. The ART clock of the other cabinets (those that did not run
detect_art) will have a different offset.

Currently there are only two device drivers that use ART:

drivers/net/ethernet/intel/e1000e/ptp.c: *system = convert_art_to_tsc(sys_cycles);
sound/pci/hda/hda_controller.c: *system = convert_art_to_tsc(tsc_counter);

Outside of that nobody cares, _for_now_.

I'm not sure if there's a means for the CPU to read ART in order to test
this correlation.

Intel SDM Vol 3B 17.17.4 speaks of 'K' with a footnote about TSC_ADJUST
and the VMCS TSC fields. But basically both TSC and ART start at 0 on
power on and given the frequency ratio 'K' is a known for native system
agents.

Again, I would suggest killing detect_art() (and the setting of
X86_FEATURE_ART) on UV systems until things are worked out. Also, given
you have your own distributed clock, I'm thinking you use that on your
own devices, obviating the immediate need for ART.

> > Also, while indeed not strictly required, that TSC_ADJUST==0 test on
> > bootcpu is nice for consumer systems, BIOS did something 'weird' if that
> > is not true. Is something like is_uv_system() available early enough?
>
> My previous version of the patches had me setting a flag that could be
> checked by the tsc_sanitize_first_cpu() function and disable the requirement
> of "TSC == 0 on socket 0" for any arch that specified it.
> (And UV did set that flag.)
>
> But Thomas said it was "hackery" and that TSC being 0 on socket 0 was no
> longer a requirement. So I took it out for this version and made the "TSC
> == 0 on socket 0" no longer the default for any arch.

That's where it comes from. But normal systems really _should_ have it
at 0 and its a useful sanity check IMO. We really want to know when the
BIOS does a funny behind our backs.