Re: [PATCH 02/16] perf: Unified API to record selective sets of arch registers

From: Stephane Eranian
Date: Wed May 02 2012 - 08:36:29 EST


On Wed, May 2, 2012 at 2:26 PM, Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
> On Wed, May 02, 2012 at 02:00:23PM +0200, Stephane Eranian wrote:
>> Sorry for the delay, had higher priority tasks to do.
> hi,
> np at all :)
> I just sent v3, but I answered some of your comments below
>
> thanks,
> jirka
>
>
>> [+asharma]
>>
>> On Thu, Apr 26, 2012 at 5:28 PM, Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
>> > On Mon, Apr 23, 2012 at 12:33:50PM +0200, Jiri Olsa wrote:
>> >> On Mon, Apr 23, 2012 at 12:10:57PM +0200, Stephane Eranian wrote:
>> >> > On Tue, Apr 17, 2012 at 1:17 PM, Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
>> >
>> > SNIP
>> >
>> >> > How are you going to deal with 32-bit binaries sampled on a 64-bit system?
>> >>
>> >> I dont have the solution right now... but seems like compat tasks need more
>> >> thinking even before go ahead with this patchset.. since it's going affect
>> >> the perf_event_attr and could bite us in future.
>> > hi,
>> > got more info on the compat task unwind
>> >
>> > - for 32 bit task running under 64 bit env. the 64 bits user
>> > Ãregisters values are stored on kernel stack when entering
>> > Ãthe kernel via exception or interrupt, like for native
>> > Ã64 bit task
>> >
>> You mean the 32-bit registers are stored on the kernel stack,
>> right? Or you mean 64-bit and the upper 32 are guaranteed 0.
>
> I meant 64 bit registers are stored on stack the same way
> as for native process. There are different code paths for
> exception, but same registers' saved stack layout.
>
> So if there's an event within the compat task, you still get
> 64 bit registers saved on stack as if the event happened
> in native process.
>
> The upper 32 are probably 0, but I'm not sure that's garanteed.
>
>>
>>
>> > ÃSo I think we can keep the current interface as far as
>> > Ãcompat tasks are concerned, since we will get 64 bits
>> > Ãregisters all the time anyway.
>> >
>> > ÃThe place that will take care of compat task unwind
>> > Ãis the post processing unwind.
>> >
>> > ÃFor each processed sample we:
>> > Ã Ã - get the sample and translate IP into MAP and DSO
>> > Ã Ã - read DSO ELF class and figure out wether we deal with
>> > Ã Ã Ã 64 or 32 bit task
>> > Ã Ã - run libunwind interface with proper task class info,
>> > Ã Ã Ã which gets us to next bullet:
>> >
>> > - 64 bit libunwind does not support unwind of 32 bit tasks ;)
>> > Ãso unless that change, I can see just one hacky way of doing
>> > Ãthis via 32 bit libunwind being loaded in separate 32 bit
>> > Ãprocess and doing remote unwind for us..
>>
>> okay was not aware of that restriction on libunwind. I copied Arun
>> on this response, so maybe he can comment on that.
>>
>> >
>> > ÃI'll try to follow on this to see if there'd be some better
>> > Ãlibunwind interface solution.. but thats quite longterm ;)
>> >
>> >
>> > As for the sample registers interface.
>> >
>> > Currently we have:
>> >
>> > Ãu64 user_sample_regs
>> > Ã- if != 0 we provide the user registers with mask specified
>> > Ã Ãby its value
>> >
>> > Ã- it will stay for compat tasks as well
>>
>> What if I say EAX|EBX|R15? but the sample was captured
>> on a 32-bit tasks. Are you going to just store 0 for R15?
>> Unless you also store a bitmask of what was actually saved,
>> then you have to fill in non-existent registers with zeroes, otherwise
>> the tool cannot parse the sample.
>
> I just sent v3, with changed design to be more generic, please check
>
> anyway, currently there's no way to mix 32 and 64 bit registers in sample.
>
> As I mentioned above, once running compat task, 64 bit registers
> are stored anyway. Given that all 32 bit registers have 64 equiv.
> you can ask to store RAX|RBX|R15.
>
Well, R8-R15 do not exist in 32-bit mode. So I wonder what is saved
on the stack for those, probably nothing. And in that case, how do you
handle the case where the user asked for R15 but it is not available and
you know that only on PMU interrupt.


> You need to know wether to examine 32 or 64 bit register afterwards.
>
>>
>>
>> > Ã- we could use PERF_SAMPLE_USER_REGS sample type instead of the != 0
>> > Ã Ãcheck to be more consistent, but that would eat up one sample bit
>> > Ã Ãunnecessary
>>
>> But then that would be aligned with how branch_stack has been implemented
>> for instance (PERF_SAMPLE_BRANCH_STACK).
>>
>> >
>> > In some previous email you suggested some generic interface like
>> >
>> > Ã Ãattr->sample_type |= PERF_SAMPLE_REGS
>> > Ã Ãattr->sample_regs = EAX | EBX | EDI | ESI |.....
>> > Ã Ãattr->sample_reg_mode = { INTR, PRECISE, USER }
>> >
>> > I think we can have something like:
>> >
>> > Ã Ãattr->sample_type |= PERF_SAMPLE_REGS
>> > Ã Ãattr->sample_reg_mode = { INTR, PRECISE, USER }
>> >
>> > but in case we want eg both USER and INTR modes together then we still
>> > need to have:
>> >
>> > Ãu64 user_sample_regs
>> > Ãu64 intr_sample_regs
>> > Ã...
>> >
>> Yes. but if we allow any combinations, then you'd need
>> u64 user_sample_regs
>> u64 intr_sample_regs
>> u64 precise_sample_regs
>>
>> Note that in the case of Intel PEBS used for precise mode, there are
>> only a subset of the INTR registers available.
>>
>> > for the register modes mask definition. Some mode combinations might be
>> > useless, but I think this could work.. we could always customize our
>> > needs with new mode ;)
>> >
>> The INTR vs. PRECISE is useful to get an idea of the skid.
>> The USER vs. INTR is useful to determine how we entered
>> the kernel in case the IP @ INTR is in the kernel.
>>
>> > I'll start to work on this unless I hear some screaming ;)
>> >
>
> my thinking with v3 was to have new sample type PERF_SAMPLE_REGS
>
> Once set there's perf_event_attr:sample_regs value carying the
> king of registers we want to store.
>
> Currently there's just following user regs bit:
>
> enum perf_sample_regs {
> Â Â Â PERF_SAMPLE_REGS_USER Â = 1U << 0, /* user registers */
> Â Â Â PERF_SAMPLE_REGS_MAX Â Â= 1U << 1, /* non-ABI */
> };
>
> If PERF_SAMPLE_REGS_USER is set then perf_event_attr::sample_regs_user
> gives the mask of user registers to store.
>
> we could add more bits like:
> Â Â Â PERF_SAMPLE_REGS_KERNEL
> Â Â Â PERF_SAMPLE_REGS_PRECISE
> Â Â Â ...
>
> to determine the kind of registers we want to dump and
> retrieve registers accordingly. And if the bit needs
> additional info we add new perf_event_attr value same
> like in sample_regs_user case.
>
>
>>
>> In any case, the important issue is how does the kernel
>> satisfy the request for registers when those Âmay not
>> be available in the interrupt task AND it is impossible
>> to know this in advance.
>>
>> Note that in the case of precise on Intel, we know in advance
>> which registers will be available. So you can fail early, when
>> the event is created.
>>
>> The alternative is to include the bitmask of which registers
>> was actually saved at the beginning of the section after the
>> ABI type flag.
>>
>>
>> > thoughts? ;)
>> >
>> >
>> > thanks and sorry for long email,
>> > jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/