Re: [RFC PATCH V2 2/9] perf: Extend ABI to support post-processing monotonic raw conversion
From: Liang, Kan
Date:  Mon Mar 13 2023 - 17:20:03 EST
On 2023-03-11 12:55 a.m., John Stultz wrote:
> On Thu, Mar 9, 2023 at 8:56 AM Liang, Kan <kan.liang@xxxxxxxxxxxxxxx> wrote:
>> On 2023-03-08 8:17 p.m., John Stultz wrote:
>>> So I spent a little bit of time today adding some trace_printks to the
>>> timekeeping code so I could record the actual TSC and timestamps being
>>> calculated from CLOCK_MONOTONIC_RAW.
>>>
>>> I did catch one error in the test code, which unfortunately I'm to blame for:
>>>   mid = start + (delta +(delta/2))/2; //round-closest
>>>
>>> That should be
>>>   mid = start + (delta +(2/2))/2  //round-closest
>>> or more simply
>>>   mid = start + (delta +1)/2; //round-closest
>>>
>>> Generalized rounding should be: (value + (DIV/2))/DIV), but I'm
>>> guessing with two as the divisor, my brain mixed it up and typed
>>> "delta". My apologies!
>>>
>>> With that fix, I'm seeing closer to ~500ns of error in the
>>> interpolation, just using the userland sampling.   Now, I've also
>>> disabled vsyscalls for this (otherwise I wouldn't be able to
>>> trace_printk), so the error likely would be higher than with
>>> vsyscalls.
>>>
>>> Now, part of the error is that:
>>>   start= rdtsc();
>>>   clock_gettime(CLOCK_MONOTONIC_RAW, &ts);
>>>   end = rdtsc();
>>>
>>> Ends up looking like
>>>   start= rdtsc();
>>>   clock_gettime() {
>>>      now = rdtsc();
>>>      delta = now - last;
>>>      ns = (delta * mult) >> shift
>>> [~midpoint~]
>>>      ts->nsec = base_ns + ns;
>>>      ts->sec = base_sec;
>>>      normalize_ts(ts)
>>>   }
>>>   end = rdtsc();
>>>
>>> And so by taking the mid-point we're always a little skewed from where
>>> the tsc was actually read.  Looking at the data for my case the tsc
>>> read seems to be ~12% in, so you could instead try:
>>>
>>> delta = end - start;
>>> p12 = start + ((delta * 12) + (100/2))/100;
>>>
>>> With that adjustment, I'm seeing error around ~40ns.
>>>
>>> Mind giving that a try?
>>
>> I tried both the new mid and p12. The error becomes even larger.
>>
>> With new mid (start + (delta +1)/2), the error is now ~3800ns
>> With p12 adjustment, the error is ~6700ns.
>>
>>
>> Here is how I run the test.
>> $./time
>> $perf record -e cycles:upp --clockid monotonic_raw $some_workaround
>> $./time
>>
>> Here are some raw data.
>>
>> For the first ./time,
>> start: 961886196018
>> end: 961886215603
>> MONO_RAW: 341485848531
>>
>> For the second ./time,
>> start: 986870117783
>> end: 986870136152
>> MONO_RAW: 351495432044
>>
>> Here is the time generated from one PEBS record.
>> TSC: 968210217271
>> PEBS_MONO_RAW (calculated via kernel conversion information): 344019503072
>>
>> Using new mid (start + (delta +1)/2), the guessed PEBS_MONO_RAW is
>> 344019506897. The error is 3825ns.
>> Using p12 adjustment, the guessed PEBS_MONO_RAW is 344019509831.
>> The error is 6759ns
> 
> Huh. I dunno. That seems wild that the error increased.
> 
> Just in case something is going astray with the PEBS_MONO_RAW logic,
> can you apply the hack patch I was using to display the MONOTONIC_RAW
> values the kernel calculates?
>   https://github.com/johnstultz-work/linux-dev/commit/8d7896b078965b059ea5e8cc21841580557f6df6
> 
> It uses trace_printk, so you'll have to cat /sys/kernel/tracing/trace
> to get the output.
> 
$ ./time_3
start: 7358368893806 end: 7358368902944 delta: 9138
MONO_RAW: 2899739790738
MID: 7358368898375
P12: 7358368894903
$ sudo cat /sys/kernel/tracing/trace | grep time_3
          time_3-1443    [002] .....  2899.858936: ktime_get_raw_ts64:
JDB: timekeeping_get_delta cycle_now: 7358368897679
          time_3-1443    [002] .....  2899.858937: ktime_get_raw_ts64:
JDB: ktime_get_raw_ts64: 2899739790738
The error between MID and cycle_now is -696ns
The error between P12 and cycle_now is 2776ns
The time_3.c is attached.
Thanks,
Kan#include <sys/time.h>
#include <time.h>
#include <stdio.h>
#include <errno.h>
static inline unsigned long rdtsc ()
{
  unsigned long var;
  unsigned int hi, lo;
  asm volatile ("rdtsc" : "=a" (lo), "=d" (hi));
  var = ((unsigned long long int) hi << 32) | lo;
  return var;
}
typedef unsigned long long u64;
int main()
{
	struct timespec ts;
	u64 start, end, delta, mid, p12;
	do {
		start= rdtsc();
		clock_gettime(CLOCK_MONOTONIC_RAW, &ts);
		end = rdtsc();
		delta = end-start;
	} while (delta  > 20000);   // make sure the reads were not preempted
	printf("start: %llu end: %llu delta: %llu\n", start, end, delta);
	printf("MONO_RAW: %llu\n", (u64)ts.tv_sec * 1000000000 + ts.tv_nsec);
	mid = start + (delta + 1)/2; //round-closest
	printf("MID: %llu\n", mid);
	p12 = start + ((delta * 12) + (100/2))/100;
	printf("P12: %llu\n", p12);
}