Re: [PATCH] perf_events: fix event scheduling issues introduced by transactional API (take 2)

From: Stephane Eranian
Date: Tue May 25 2010 - 11:02:25 EST

Next message: Steven Rostedt: "Re: [PATCH 5/8] tracing: Convert more ext4 events to DEFINE_EVENT"
Previous message: Minchan Kim: "Re: [PATCH] cache last free vmap_area to avoid restarting beginning"
In reply to: Stephane Eranian: "Re: [PATCH] perf_events: fix event scheduling issues introduced by transactional API (take 2)"
Next in thread: Peter Zijlstra: "Re: [PATCH] perf_events: fix event scheduling issues introduced by transactional API (take 2)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Ok, the patch look good expect it needs:

static int x86_pmu_commit_txn(const struct pmu *pmu)
{
......
/*
* copy new assignment, now we know it is possible
* will be used by hw_perf_enable()
*/
memcpy(cpuc->assign, assign, n*sizeof(int));

cpuc->n_txn = 0;

return 0;
}

Because you always call cancel_txn() even when commit()
succeeds. I don't really understand why. I think it could be
avoided by clearing the group_flag in commit_txn() if it
succeeds. It would also make the logical flow more natural. Why
cancel something that has succeeded. You cancel when you fail/abort.

On Tue, May 25, 2010 at 4:19 PM, Stephane Eranian <eranian@xxxxxxxxxx> wrote:
> Ok, let me check. I think it is almost right.
>
> On Tue, May 25, 2010 at 4:03 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> On Tue, 2010-05-25 at 15:39 +0200, stephane eranian wrote:
>>> On Tue, May 25, 2010 at 3:35 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>>> > On Tue, 2010-05-25 at 15:20 +0200, Stephane Eranian wrote:
>>> >
>>> >> With this patch, you can now overcommit the PMU even with pinned
>>> >> system-wide events present and still get valid counts.
>>> >
>>> > Does this patch differ from the one you send earlier?
>>> >
>>> Yes, it does. It changes the way n_added is updated.
>>>
>>> The first version was buggy (perf top would crash the machine).
>>> You cannot delay updating n_added until commit, because there
>>> are paths where you don't go through transactions. This version
>>> instead keeps the number of events last added and speculatively
>>> updates n_added (assuming success). If the transaction succeeds,
>>> then no correction is done (subtracting 0), if it fails, then the number
>>> of events just added to n_added is subtracted.
>>
>>
>> OK, just to see if I got it right, I wrote a similar patch from just the
>> changelog.
>>
>> Does the below look good?
>>
>> ---
>> Âarch/x86/kernel/cpu/perf_event.c | Â 13 +++++++++++++
>> Âkernel/perf_event.c Â Â Â Â Â Â Â| Â 11 +++++++----
>> Â2 files changed, 20 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
>> index 856aeeb..be84944 100644
>> --- a/arch/x86/kernel/cpu/perf_event.c
>> +++ b/arch/x86/kernel/cpu/perf_event.c
>> @@ -106,6 +106,7 @@ struct cpu_hw_events {
>>
>> Â Â Â Âint Â Â Â Â Â Â Â Â Â Â n_events;
>> Â Â Â Âint Â Â Â Â Â Â Â Â Â Â n_added;
>> + Â Â Â int Â Â Â Â Â Â Â Â Â Â n_txn;
>> Â Â Â Âint Â Â Â Â Â Â Â Â Â Â assign[X86_PMC_IDX_MAX]; /* event to counter assignment */
>> Â Â Â Âu64 Â Â Â Â Â Â Â Â Â Â tags[X86_PMC_IDX_MAX];
>> Â Â Â Âstruct perf_event Â Â Â *event_list[X86_PMC_IDX_MAX]; /* in enabled order */
>> @@ -983,6 +984,7 @@ static int x86_pmu_enable(struct perf_event *event)
>> Âout:
>> Â Â Â Âcpuc->n_events = n;
>> Â Â Â Âcpuc->n_added += n - n0;
>> + Â Â Â cpuc->n_txn += n - n0;
>>
>> Â Â Â Âreturn 0;
>> Â}
>> @@ -1089,6 +1091,14 @@ static void x86_pmu_disable(struct perf_event *event)
>> Â Â Â Âstruct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
>> Â Â Â Âint i;
>>
>> + Â Â Â /*
>> + Â Â Â Â* If we're called during a txn, we don't need to do anything.
>> + Â Â Â Â* The events never got scheduled and ->cancel_txn will truncate
>> + Â Â Â Â* the event_list.
>> + Â Â Â Â*/
>> + Â Â Â if (cpuc->group_flags & PERF_EVENT_TXN_STARTED)
>> + Â Â Â Â Â Â Â return;
>> +
>> Â Â Â Âx86_pmu_stop(event);
>>
>> Â Â Â Âfor (i = 0; i < cpuc->n_events; i++) {
>> @@ -1379,6 +1389,7 @@ static void x86_pmu_start_txn(const struct pmu *pmu)
>> Â Â Â Âstruct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
>>
>> Â Â Â Âcpuc->group_flag |= PERF_EVENT_TXN_STARTED;
>> + Â Â Â cpuc->n_txn = 0;
>> Â}
>>
>> Â/*
>> @@ -1391,6 +1402,8 @@ static void x86_pmu_cancel_txn(const struct pmu *pmu)
>> Â Â Â Âstruct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
>>
>> Â Â Â Âcpuc->group_flag &= ~PERF_EVENT_TXN_STARTED;
>> + Â Â Â cpuc->n_added -= cpuc->n_txn;
>> + Â Â Â cpuc->n_events -= cpuc->n_txn;
>> Â}
>>
>> Â/*
>> diff --git a/kernel/perf_event.c b/kernel/perf_event.c
>> index e099650..ca79f2a 100644
>> --- a/kernel/perf_event.c
>> +++ b/kernel/perf_event.c
>> @@ -652,8 +652,11 @@ group_sched_in(struct perf_event *group_event,
>> Â Â Â Âif (txn)
>> Â Â Â Â Â Â Â Âpmu->start_txn(pmu);
>>
>> - Â Â Â if (event_sched_in(group_event, cpuctx, ctx))
>> + Â Â Â if (event_sched_in(group_event, cpuctx, ctx)) {
>> + Â Â Â Â Â Â Â if (txn)
>> + Â Â Â Â Â Â Â Â Â Â Â pmu->cancel_txn(pmu);
>> Â Â Â Â Â Â Â Âreturn -EAGAIN;
>> + Â Â Â }
>>
>> Â Â Â Â/*
>> Â Â Â Â * Schedule in siblings as one group (if any):
>> @@ -675,9 +678,6 @@ group_sched_in(struct perf_event *group_event,
>> Â Â Â Â}
>>
>> Âgroup_error:
>> - Â Â Â if (txn)
>> - Â Â Â Â Â Â Â pmu->cancel_txn(pmu);
>> -
>> Â Â Â Â/*
>> Â Â Â Â * Groups can be scheduled in as one unit only, so undo any
>> Â Â Â Â * partial group before returning:
>> @@ -689,6 +689,9 @@ group_error:
>> Â Â Â Â}
>> Â Â Â Âevent_sched_out(group_event, cpuctx, ctx);
>>
>> + Â Â Â if (txn)
>> + Â Â Â Â Â Â Â pmu->cancel_txn(pmu);
>> +
>> Â Â Â Âreturn -EAGAIN;
>> Â}
>>
>>
>>
>
>
>
> --
> Stephane Eranian Â| EMEA Software Engineering
> Google France | 38 avenue de l'OpÃra | 75002 Paris
> Tel : +33 (0) 1 42 68 53 00
> This email may be confidential or privileged. If you received this
> communication by mistake, please
> don't forward it to anyone else, please erase all copies and
> attachments, and please let me know that
> it went to the wrong person. Thanks
>

--
Stephane Eranian | EMEA Software Engineering
Google France | 38 avenue de l'OpÃra | 75002 Paris
Tel : +33 (0) 1 42 68 53 00
This email may be confidential or privileged. If you received this
communication by mistake, please
don't forward it to anyone else, please erase all copies and
attachments, and please let me know that
it went to the wrong person. Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Steven Rostedt: "Re: [PATCH 5/8] tracing: Convert more ext4 events to DEFINE_EVENT"
Previous message: Minchan Kim: "Re: [PATCH] cache last free vmap_area to avoid restarting beginning"
In reply to: Stephane Eranian: "Re: [PATCH] perf_events: fix event scheduling issues introduced by transactional API (take 2)"
Next in thread: Peter Zijlstra: "Re: [PATCH] perf_events: fix event scheduling issues introduced by transactional API (take 2)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]