Re: [PATCH] perf/x86/intel: Fix n_metric for the canceled group

From: Kim Phillips
Date: Fri Oct 02 2020 - 17:10:54 EST


On 10/2/20 8:16 AM, Liang, Kan wrote:
> On 10/2/2020 7:02 AM, Peter Zijlstra wrote:
>> On Wed, Sep 30, 2020 at 07:29:35AM -0700, kan.liang@xxxxxxxxxxxxxxx wrote:
>>> From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
>>>
>>> When a group that has TopDown members is failed to be scheduled, any
>>> later TopDown groups will not return valid values.
>>>
>>> Here is an example.
>>>
>>> A background perf that occupies all the GP counters and the fixed
>>> counter 1.
>>>   $perf stat -e "{cycles,cycles,cycles,cycles,cycles,cycles,cycles,
>>>                   cycles,cycles}:D" -a
>>>
>>> A user monitors a TopDown group. It works well, because the fixed
>>> counter 3 and the PERF_METRICS are available.
>>>   $perf stat -x, --topdown -- ./workload
>>>     retiring,bad speculation,frontend bound,backend bound,
>>>     18.0,16.1,40.4,25.5,
>>>
>>> Then the user tries to monitor a group that has TopDown members.
>>> Because of the cycles event, the group is failed to be scheduled.
>>>   $perf stat -x, -e '{slots,topdown-retiring,topdown-be-bound,
>>>                       topdown-fe-bound,topdown-bad-spec,cycles}'
>>>                       -- ./workload
>>>      <not counted>,,slots,0,0.00,,
>>>      <not counted>,,topdown-retiring,0,0.00,,
>>>      <not counted>,,topdown-be-bound,0,0.00,,
>>>      <not counted>,,topdown-fe-bound,0,0.00,,
>>>      <not counted>,,topdown-bad-spec,0,0.00,,
>>>      <not counted>,,cycles,0,0.00,,
>>>
>>> The user tries to monitor a TopDown group again. It doesn't work anymore.
>>>   $perf stat -x, --topdown -- ./workload
>>>
>>>      ,,,,,
>>>
>>> In a txn, cancel_txn() is to truncate the event_list for a canceled
>>> group and update the number of events added in this transaction.
>>> However, the number of TopDown events added in this transaction is not
>>> updated. The kernel will probably fail to add new Topdown events.
>>>
>>> Check if the canceled group has Topdown events. If so, subtract the
>>> TopDown events from n_metric accordingly.
>>>
>>> Fixes: 7b2c05a15d29 ("perf/x86/intel: Generic support for hardware TopDown metrics")
>>> Reported-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
>>> Reviewed-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
>>> Signed-off-by: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
>>> ---

>>
>> Urgh, I'd much rather we add n_txn_metric. But also, while looking at
>> this, don't we have the same problem with n_pair ?
>>
>> Something like this perhaps...
>>
>
> Sure. For the perf metric, the below patch fixes the problem.
>
> Tested-by: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>

Tested-by: Kim Phillips <kim.phillips@xxxxxxx>

Excerpt from test script:

sudo perf stat -e "{cycles,cycles,cycles,cycles}:D" -a sleep 10 &

# should succeed:
sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload

# should fail:
sudo perf stat -e "{fp_ret_sse_avx_ops.all,fp_ret_sse_avx_ops.all,cycles}:D" -a workload

# previously failed, now succeeds with this patch:
sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload

Thanks both,

Kim