Re: perf_counters issue with enable_on_exec

From: stephane eranian
Date: Mon Aug 24 2009 - 12:03:36 EST

On Mon, Aug 24, 2009 at 5:44 PM, stephane eranian<eranian@xxxxxxxxxxxxxx> wrote:
> On Mon, Aug 24, 2009 at 3:46 PM, Peter Zijlstra<a.p.zijlstra@xxxxxxxxx> wrote:
>> On Thu, 2009-08-20 at 15:49 +0200, stephane eranian wrote:
>>> Hi,
>>> I am running into an issue trying to use enable_on_exec
>>> in per-thread mode with an event group.
>>> My understanding is that enable_on_exec allows activation
>>> of an event on first exec. This is useful for tools monitoring
>>> other tasks and which you invoke as: tool my_program. In
>>> other words, the tool forks+execs my_program. This option
>>> allows developers to setup the events after the fork (to get
>>> the pid) but before the exec(). Only execution after the exec
>>> is monitored. This alleviates the need to use the
>>> ptrace(PTRACE_TRACEME) call.
>>> My understanding is that an event group is scheduled only
>>> if all events in the group are active (disabled=0). Thus, one
>>> trick to activate a group Âwith a single ioctl(PERF_IOC_ENABLE)
>>> is to enable all events in the group except the leader. This works
>>> well. But once you add enable_on_exec on on the events,
>>> things go wrong. The non-leader events start counting before
>>> the exec. If the non-leader events are created in disabled state,
>>> then they never activate on exec.
>>> The attached test program demonstrates the problem.
>>> simply invoke with a program that runs for a few seconds.
>> OK, lots of issues here
>> Â1) your code is broken ;-)
> That's true. I knew about the missing synchro. But I think
> the problem existed nonetheless.
>> Â2) enable_on_exec on !leader counters is undefined
> then fail it.
>> Â3) there is something fishy non the less
> True.
>> 1. you fork() then create a counter group in both the parent and the
>> child without sync, then read the parent group. This obviously doesn't
>> do what is expected. See attached proglet for a better version.
> I have modified the program based on your changes. See new version attached.
>> 2. enable_on_exec only works on leaders, Paul, was that intended?
> All events in a group are scheduled together. If one event is not enabled
> in a group, then the group is not dispatched. Setting enable_on_exec
> just on leader makes sense. Then to enable the group on exec, you
> enabled all events but the leader. The enable_on_exec will enable
> the leader on exec and the group will be ready for dispatch. That's
> how it should work in my mind.
> As you indicated the issue is with the timing information and I think
> it is not related to enable_on_exec. It is more related to the fact
> that to enable a group with a single ioctl() you enable ALL BUT the
> leader. But that means that the time_enabled for the !leader is
> ticking. Thus scaling won't be as expected yet it is correct
> given what happens internally.
> I think there needs to be a distinction between 'enabled immediately
> but cannot run because group is not totally enabled' and 'cannot run
> because the group has been multiplexed out yet all could be dispatched
> because all events were dispatched'. In the former, it seems you don't
> want time_enabled to tick, while in the latter you do. In other words,
> time_enabled ticks for each event if the group is 'dispatch-able' (or
> runnable in your terminology) otherwise it does not. time_enabled reflects
> the fact that the group could run but did not have access to the PMU
> resource because of contention with other groups.
In other words, I think timing_enabled is measuring the wrong thing.
It should be instead called time_runnable and it should measure the
time during which the event is runnable, i.e, its group is runnable. That
means the event (group) could be dispatched if PMU was "free".
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at