Re: [BUG] perf/x86/intel/pebs: PEBS timestamps overwritten

From: Liang, Kan
Date: Fri Aug 05 2022 - 09:37:06 EST




On 2022-08-05 6:49 a.m., Stephane Eranian wrote:
> Hi,
>
> I was alerted by an internal user that the PEBS TSC-based timestamps
> do not appear
> correctly in the final perf.data output file from perf record.
>
> After some investigation, I came to the conclusion that indeed the
> data->time field setup
> by PEBS in the setup_pebs_fixed_sample_data() is later overwritten by
> perf_events generic
> code in perf_prepare_sample(). There is an ordering problem here.
>
> Looking around we found that this problem had been uncovered back in
> May 2020 and a
> patch had been posted then:
> https://lore.kernel.org/lkml/e754b625-bf14-8f5f-bd1a-71d774057005@xxxxxxxxx/T/
>
> However this patch was never commented upon or committed.
>
> The problem is still present in the upstream code today.
>
> 1. perf_sample_data_init()
> 2. setup_pebs_fixed_sample_data(): data->time =
> native_sched_clock_from_tsc(pebs->tsc);
> 3. perf_prepare_sample(): data->time = perf_event_clock(event);
>
> The patch from 2020 (Andreas Kogler) fixes the problem by making the
> assignment in 3.
> conditioned to the value of data->time being 0. Andreas also suggested
> an alternative which
> would break up the call to perf_event_ouput() like this is done in the
> BTS code allowing
> the prepare_sample() call to be made before PEBS samples are
> extracted. That would
> generate some code duplication. Although this approach appears more
> robust, the one
> issue I see is that prepare_sample may need data that would be filled
> by PEBS and
> therefore it would need to be called afterwards.
>
> Any better ideas?

I think Andreas's patch is the most straightforward and simplest patch
to fix the issue. But, if I recall correctly, Peter prefers to minimize
the cachelines touched by the perf_sample_data_init(). So initializing
the data->time in the perf_sample_data_init() seems break the rule.

I think HW will provide more and more such kind of precise information.
Maybe we can use a flag variable to track whether the information is
already provided to avoid the overwritten.

Here are the two patches (Not test yet). The first one is to fix the
timestamp issue. The second one is to use the flag to replace
data->br_stack = NULL; as an examples. I think we can further move other
variables out of the perf_sample_data_init() to minimize the cachelines
touched by the perf_sample_data_init().