Re: [PATCH V2 6/7] perf, x86: Use LBR call stack to get user callchain

From: Stephane Eranian
Date: Wed Oct 24 2012 - 08:10:54 EST


On Wed, Oct 24, 2012 at 1:52 PM, Yan, Zheng <zheng.z.yan@xxxxxxxxx> wrote:
> On 10/24/2012 07:47 PM, Stephane Eranian wrote:
>> On Wed, Oct 24, 2012 at 1:23 PM, Yan, Zheng <zheng.z.yan@xxxxxxxxx> wrote:
>>> On 10/24/2012 04:57 PM, Stephane Eranian wrote:
>>>> On Wed, Oct 24, 2012 at 7:59 AM, Yan, Zheng <zheng.z.yan@xxxxxxxxx> wrote:
>>>>> From: "Yan, Zheng" <zheng.z.yan@xxxxxxxxx>
>>>>>
>>>>> Try enabling the LBR call stack feature if event request recording
>>>>> callchain. Try utilizing the LBR call stack to get user callchain
>>>>> in case of there is no frame pointer.
>>>>>
>>>>> Signed-off-by: Yan, Zheng <zheng.z.yan@xxxxxxxxx>
>>>>> ---
>>>>> arch/x86/kernel/cpu/perf_event.c | 126 +++++++++++++++++++++--------
>>>>> arch/x86/kernel/cpu/perf_event.h | 7 ++
>>>>> arch/x86/kernel/cpu/perf_event_intel.c | 20 ++---
>>>>> arch/x86/kernel/cpu/perf_event_intel_lbr.c | 3 +
>>>>> include/linux/perf_event.h | 6 ++
>>>>> kernel/events/core.c | 11 ++-
>>>>> 6 files changed, 124 insertions(+), 49 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
>>>>> index 8ae8044..3bf2100 100644
>>>>> --- a/arch/x86/kernel/cpu/perf_event.c
>>>>> +++ b/arch/x86/kernel/cpu/perf_event.c
>>>>> @@ -398,35 +398,46 @@ int x86_pmu_hw_config(struct perf_event *event)
>>>>>
>>>>> if (event->attr.precise_ip > precise)
>>>>> return -EOPNOTSUPP;
>>>>> - /*
>>>>> - * check that PEBS LBR correction does not conflict with
>>>>> - * whatever the user is asking with attr->branch_sample_type
>>>>> - */
>>>>> - if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format < 2) {
>>>>> - u64 *br_type = &event->attr.branch_sample_type;
>>>>> -
>>>>> - if (has_branch_stack(event)) {
>>>>> - if (!precise_br_compat(event))
>>>>> - return -EOPNOTSUPP;
>>>>> -
>>>>> - /* branch_sample_type is compatible */
>>>>> -
>>>>> - } else {
>>>>> - /*
>>>>> - * user did not specify branch_sample_type
>>>>> - *
>>>>> - * For PEBS fixups, we capture all
>>>>> - * the branches at the priv level of the
>>>>> - * event.
>>>>> - */
>>>>> - *br_type = PERF_SAMPLE_BRANCH_ANY;
>>>>> -
>>>>> - if (!event->attr.exclude_user)
>>>>> - *br_type |= PERF_SAMPLE_BRANCH_USER;
>>>>> -
>>>>> - if (!event->attr.exclude_kernel)
>>>>> - *br_type |= PERF_SAMPLE_BRANCH_KERNEL;
>>>>> - }
>>>>> + }
>>>>> + /*
>>>>> + * check that PEBS LBR correction does not conflict with
>>>>> + * whatever the user is asking with attr->branch_sample_type
>>>>> + */
>>>>> + if (event->attr.precise_ip > 1 && x86_pmu.intel_cap.pebs_format < 2) {
>>>>> + u64 *br_type = &event->attr.branch_sample_type;
>>>>> +
>>>>> + if (has_branch_stack(event)) {
>>>>> + if (!precise_br_compat(event))
>>>>> + return -EOPNOTSUPP;
>>>>> +
>>>>> + /* branch_sample_type is compatible */
>>>>> +
>>>>> + } else {
>>>>> + /*
>>>>> + * user did not specify branch_sample_type
>>>>> + *
>>>>> + * For PEBS fixups, we capture all
>>>>> + * the branches at the priv level of the
>>>>> + * event.
>>>>> + */
>>>>> + *br_type = PERF_SAMPLE_BRANCH_ANY;
>>>>> +
>>>>> + if (!event->attr.exclude_user)
>>>>> + *br_type |= PERF_SAMPLE_BRANCH_USER;
>>>>> +
>>>>> + if (!event->attr.exclude_kernel)
>>>>> + *br_type |= PERF_SAMPLE_BRANCH_KERNEL;
>>>>> + }
>>>>> + } else if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
>>>>> + if (!has_branch_stack(event) && x86_pmu.attr_lbr_callstack) {
>>>>> + /*
>>>>> + * user did not specify branch_sample_type,
>>>>> + * try using the LBR call stack facility to
>>>>> + * record call chains in the user space.
>>>>> + */
>>>>> + event->attr.branch_sample_type =
>>>>> + PERF_SAMPLE_BRANCH_USER |
>>>>> + PERF_SAMPLE_BRANCH_CALL_STACK;
>>>>
>>>> You are forcing user level here, but how do you know the user wanted
>>>> ONLY user level
>>>> callchains?
>>>>
>>>>
>>>
>>> The LBR call stack is used only when the frame pointer approach doesn't work.
>>
>> And where is that determination made?
>
> check code that is added to perf_callchain_user and perf_callchain_user32
>
Are you saying you can run in a mode where you get kernel call stack via
frame-pointer and the user call stack via LBR cstack for a single event?


>>
>>> I think the kernel has frame pointer for the most cases. The second reason is
>>> that the LBR call stack only has 16 entries. I think it's too small to record
>>> both kernel and user call chains.
>>>
>> It's even too small for many object oriented user programs as well.
>>
>>> Regards
>>> Yan, Zheng
>>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/