Re: [RFC] perf/x86: Fix a warning on x86_pmu_stop()

From: Stephane Eranian
Date: Tue Nov 24 2020 - 03:19:48 EST


Hi,

Another remark on the PEBS drainage code, it seems to me like a test
is not quite correct:
intel_pmu_drain_pebs_nhm()
{
...
if (p->status != (1ULL << bit)) {
for_each_set_bit(i, (unsigned long *)&pebs_status, size)
error[i]++;
continue;
}

The kernel cannot disambiguate when 2+ PEBS counters overflow at the
same time. This is what the comment for this code suggests.
However, I see the comparison is done with the unfiltered p->status
which is a copy of IA32_PERF_GLOBAL_STATUS at the time of
the sample. This register contains more than the PEBS counter overflow
bits. It also includes many other bits which could also be set.

Shouldn't this test use pebs_status instead (which covers only the
PEBS counters)?

if (pebs_status != (1ULL << bit)) {
}

Or am I missing something?
Thanks.


On Tue, Nov 24, 2020 at 12:09 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Tue, Nov 24, 2020 at 02:01:39PM +0900, Namhyung Kim wrote:
>
> > Yes, it's not about __intel_pmu_pebs_event(). I'm looking at
> > intel_pmu_drain_pebs_nhm() specifically. There's code like
> >
> > /* log dropped samples number */
> > if (error[bit]) {
> > perf_log_lost_samples(event, error[bit]);
> >
> > if (perf_event_account_interrupt(event))
> > x86_pmu_stop(event, 0);
> > }
> >
> > if (counts[bit]) {
> > __intel_pmu_pebs_event(event, iregs, base,
> > top, bit, counts[bit],
> > setup_pebs_fixed_sample_data);
> > }
> >
> > There's a path to x86_pmu_stop() when an error bit is on.
>
> That would seem to suggest you try something like this:
>
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index 31b9e58b03fe..8c6ee8be8b6e 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1945,7 +1945,7 @@ static void intel_pmu_drain_pebs_nhm(struct pt_regs *iregs, struct perf_sample_d
> if (error[bit]) {
> perf_log_lost_samples(event, error[bit]);
>
> - if (perf_event_account_interrupt(event))
> + if (iregs && perf_event_account_interrupt(event))
> x86_pmu_stop(event, 0);
> }
>