On Tue, Apr 26 2022 at 11:36, Waiman Long wrote:Yes, that is my point that frequency doesn't matter if frequency remain the same. Of course, all bets are off if frequency really change.
On 4/25/22 15:24, Thomas Gleixner wrote:Fair enough, but what I meant is that when estimating the actual sync
Yes. It's clear that the initial sync overhead is due to the cache lineIn check_tsc_warp(), the (unlikely(prev > now) check may only be
being remote, but I rather underestimate the compensation. Aside of that
it's not guaranteed that the cache line is actually remote on the first
access. It's by chance, but not by design.
triggered to record the possible wrap if last_tsc was previously written
to by another cpu. That requires the transfer of lock cacheline from the
remote cpu to local cpu as well. So sync overhead with remote cacheline
is what really matters here. I had actually thought about just measuring
local cacheline sync overhead so as to underestimate it and I am fine
about doing it.
overhead then there is no guarantee that the cache line is remote.
The CPU which does that estimation might have been the last to lock,
there is no guarantee that the reference CPU locked last or wrote to the
cache line last.
I grant you that it does not matter for the loop under the assumptionIOW, TSC runs with a constant frequency independent of the actual CPUYes, I understand that. The measurement of sync_overhead is for
frequency, ergo the CPU frequency dependent execution time has an
influence on the resulting compensation value, no?
On the machine I tested on, it's a factor of 3 between the minimal and
the maximal CPU frequency, which makes quite a difference, right?
estimating the delay (in TSC cycles) that the locking overhead
introduces. With 1000MHz frequency, the delay in TSC cycle will be
double that of a cpu running at 2000MHz. So you need more compensation
in this case. That is why I said that as long as clock frequency doesn't
change in the check_tsc_wrap() loop and the sync_overhead measurement
part of the code, the actual cpu frequency does not matter here.
that the loop runs at constant frequency, but is that a guarantee that
it does not matter later on?
I don't think the overhead will be directly proportional to the cpu frequency. A 3X increase in frequency will certainly cause the overhead to be lowered, but it won't be 1/3. Maybe 1/2 at most.
If you overcompensate by a factor of 3 because the upcoming CPU ran at
the lowest frequency, then it might become visible later when everything
runs at full speed.
However about we half the measure sync_overhead as compensation to avoidHalf of what?
over-estimation, but probably increase the chance that we need a second
adjustment of TSC wrap.
With this patch applied, the measured overhead on the same CooperLakeHalf of something which jumps around? Not convinced. :)
system on different reboot runs varies from 104 to 326.
Btw:
Could you please do that? I really like to see the data points.Yes, I will try that experiment and report back the results.