Re: [PATCH 03/12] perf/x86/intel: Add perf core PMU support for Sapphire Rapids

From: Liang, Kan
Date: Tue Jan 26 2021 - 11:23:53 EST




On 1/26/2021 10:37 AM, Peter Zijlstra wrote:
On Tue, Jan 19, 2021 at 12:38:22PM -0800, kan.liang@xxxxxxxxxxxxxxx wrote:
@@ -1577,9 +1668,20 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
}
if (format_size & PEBS_DATACFG_MEMINFO) {
+ if (sample_type & PERF_SAMPLE_WEIGHT) {
+ u64 weight = meminfo->latency;
+
+ if (x86_pmu.flags & PMU_FL_INSTR_LATENCY)
+ weight >>= PEBS_CACHE_LATENCY_OFFSET;
+ data->weight = weight & PEBS_LATENCY_MASK ?:
intel_get_tsx_weight(meminfo->tsx_tuning);
+ }
+
+ if (sample_type & PERF_SAMPLE_WEIGHT_EXT) {
+ data->weight_ext.val = 0;
+ if (x86_pmu.flags & PMU_FL_INSTR_LATENCY)
+ data->weight_ext.instr_latency = meminfo->latency & PEBS_LATENCY_MASK;
+ }
if (sample_type & PERF_SAMPLE_DATA_SRC)
data->data_src.val = get_data_src(event, meminfo->aux);

Talk to me about that SAMPLE_WEIGHT stuff.... I'm not liking it.

Sure you want multiple dimensions, but urgh.

Also, afaict, as proposed you're wasting 80/128 bits. That is, all data
you want to export fits in a single u64 and yet you're using two, which
is mighty daft.

Sure, pebs::lat / pebs_meminfo::latency is defined as a u64, but you
can't tell me that that is ever actually more than 4G cycles. Even the
TSX block latency is u32.

So how about defining SAMPLE_WEIGHT_STRUCT which uses the exact same
data as SAMPLE_WEIGHT but unions it with a struct. I'm not sure if we
want:

union sample_weight {
u64 weight;

struct {
u32 low_dword;
u32 high_dword;
};

/* or */

struct {
u32 low_dword;
u16 high_word;
u16 higher_word;
};
};

Then have the core code enforce SAMPLE_WEIGHT ^ SAMPLE_WEIGHT_STRUCT and
make the existing code never set the high dword.

So the kernel will only accept either SAMPLE_WEIGHT type or SAMPLE_WEIGHT_STRUCT type. It should error out if both types are set, right?

I will check if u32 is enough for meminfo::latency on the previous platforms.

Thanks,
Kan