Re: [PATCH 2/4] perf/x86: add support for PERF_SAMPLE_BRANCH_CALL

From: Ingo Molnar
Date: Tue Oct 13 2015 - 09:40:31 EST



* Stephane Eranian <eranian@xxxxxxxxxx> wrote:

> This patch enables the suport for the PERF_SAMPLE_BRANCH_CALL
> for Intel x86 processors. When the processor support LBR filtering
> this the selection is done in hardware. Otherwise, the filter is
> applied by software. Note that we chose to include zero length calls
> because they also represent calls.
>
> Signed-off-by: Stephane Eranian <eranian@xxxxxxxxxx>
> ---
> arch/x86/kernel/cpu/perf_event_intel_lbr.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_lbr.c b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
> index ad0b8b0..bfd0b71 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_lbr.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_lbr.c
> @@ -555,6 +555,8 @@ static int intel_pmu_setup_sw_lbr_filter(struct perf_event *event)
> if (br_type & PERF_SAMPLE_BRANCH_IND_JUMP)
> mask |= X86_BR_IND_JMP;
>
> + if (br_type & PERF_SAMPLE_BRANCH_CALL)
> + mask |= X86_BR_CALL | X86_BR_ZERO_CALL;

I'm wondering how frequent zero-length calls are. If they still occur in typical
user-space, would it make sense to also have a separate branch sampling type for
zero length calls?

Intel documents zero length calls as ones that (ab-)use the call instruction to
push the current IP on the stack:

call next_addr
next_addr:
pop %reg

which can take over 10 cycles on certain microarchitectures (and it unbalances
whatever call stack tracking/caching the CPU does as well).

So it might make sense to analyze them separately. I guess that's the reason why
Intel added a separate flag for them in the PMU.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/