Re: [PATCH] perf_events: fix event scheduling issues introduced by transactional API (take 2)

From: Stephane Eranian
Date: Tue May 25 2010 - 10:19:31 EST


Ok, let me check. I think it is almost right.

On Tue, May 25, 2010 at 4:03 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Tue, 2010-05-25 at 15:39 +0200, stephane eranian wrote:
>> On Tue, May 25, 2010 at 3:35 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> > On Tue, 2010-05-25 at 15:20 +0200, Stephane Eranian wrote:
>> >
>> >> With this patch, you can now overcommit the PMU even with pinned
>> >> system-wide events present and still get valid counts.
>> >
>> > Does this patch differ from the one you send earlier?
>> >
>> Yes, it does. It changes the way n_added is updated.
>>
>> The first version was buggy (perf top would crash the machine).
>> You cannot delay updating n_added until commit, because there
>> are paths where you don't go through transactions. This version
>> instead keeps the number of events last added and speculatively
>> updates n_added (assuming success). If the transaction succeeds,
>> then no correction is done (subtracting 0), if it fails, then the number
>> of events just added to n_added is subtracted.
>
>
> OK, just to see if I got it right, I wrote a similar patch from just the
> changelog.
>
> Does the below look good?
>
> ---
> Âarch/x86/kernel/cpu/perf_event.c | Â 13 +++++++++++++
> Âkernel/perf_event.c       Â|  11 +++++++----
> Â2 files changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
> index 856aeeb..be84944 100644
> --- a/arch/x86/kernel/cpu/perf_event.c
> +++ b/arch/x86/kernel/cpu/perf_event.c
> @@ -106,6 +106,7 @@ struct cpu_hw_events {
>
>    Âint           n_events;
>    Âint           n_added;
> +    int           n_txn;
>    Âint           assign[X86_PMC_IDX_MAX]; /* event to counter assignment */
> Â Â Â Âu64 Â Â Â Â Â Â Â Â Â Â tags[X86_PMC_IDX_MAX];
>    Âstruct perf_event    *event_list[X86_PMC_IDX_MAX]; /* in enabled order */
> @@ -983,6 +984,7 @@ static int x86_pmu_enable(struct perf_event *event)
> Âout:
> Â Â Â Âcpuc->n_events = n;
> Â Â Â Âcpuc->n_added += n - n0;
> + Â Â Â cpuc->n_txn += n - n0;
>
> Â Â Â Âreturn 0;
> Â}
> @@ -1089,6 +1091,14 @@ static void x86_pmu_disable(struct perf_event *event)
> Â Â Â Âstruct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
> Â Â Â Âint i;
>
> + Â Â Â /*
> + Â Â Â Â* If we're called during a txn, we don't need to do anything.
> + Â Â Â Â* The events never got scheduled and ->cancel_txn will truncate
> + Â Â Â Â* the event_list.
> + Â Â Â Â*/
> + Â Â Â if (cpuc->group_flags & PERF_EVENT_TXN_STARTED)
> + Â Â Â Â Â Â Â return;
> +
> Â Â Â Âx86_pmu_stop(event);
>
> Â Â Â Âfor (i = 0; i < cpuc->n_events; i++) {
> @@ -1379,6 +1389,7 @@ static void x86_pmu_start_txn(const struct pmu *pmu)
> Â Â Â Âstruct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
>
> Â Â Â Âcpuc->group_flag |= PERF_EVENT_TXN_STARTED;
> + Â Â Â cpuc->n_txn = 0;
> Â}
>
> Â/*
> @@ -1391,6 +1402,8 @@ static void x86_pmu_cancel_txn(const struct pmu *pmu)
> Â Â Â Âstruct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
>
> Â Â Â Âcpuc->group_flag &= ~PERF_EVENT_TXN_STARTED;
> + Â Â Â cpuc->n_added -= cpuc->n_txn;
> + Â Â Â cpuc->n_events -= cpuc->n_txn;
> Â}
>
> Â/*
> diff --git a/kernel/perf_event.c b/kernel/perf_event.c
> index e099650..ca79f2a 100644
> --- a/kernel/perf_event.c
> +++ b/kernel/perf_event.c
> @@ -652,8 +652,11 @@ group_sched_in(struct perf_event *group_event,
> Â Â Â Âif (txn)
> Â Â Â Â Â Â Â Âpmu->start_txn(pmu);
>
> - Â Â Â if (event_sched_in(group_event, cpuctx, ctx))
> + Â Â Â if (event_sched_in(group_event, cpuctx, ctx)) {
> + Â Â Â Â Â Â Â if (txn)
> + Â Â Â Â Â Â Â Â Â Â Â pmu->cancel_txn(pmu);
> Â Â Â Â Â Â Â Âreturn -EAGAIN;
> + Â Â Â }
>
> Â Â Â Â/*
> Â Â Â Â * Schedule in siblings as one group (if any):
> @@ -675,9 +678,6 @@ group_sched_in(struct perf_event *group_event,
> Â Â Â Â}
>
> Âgroup_error:
> - Â Â Â if (txn)
> - Â Â Â Â Â Â Â pmu->cancel_txn(pmu);
> -
> Â Â Â Â/*
> Â Â Â Â * Groups can be scheduled in as one unit only, so undo any
> Â Â Â Â * partial group before returning:
> @@ -689,6 +689,9 @@ group_error:
> Â Â Â Â}
> Â Â Â Âevent_sched_out(group_event, cpuctx, ctx);
>
> + Â Â Â if (txn)
> + Â Â Â Â Â Â Â pmu->cancel_txn(pmu);
> +
> Â Â Â Âreturn -EAGAIN;
> Â}
>
>
>



--
Stephane Eranian | EMEA Software Engineering
Google France | 38 avenue de l'OpÃra | 75002 Paris
Tel : +33 (0) 1 42 68 53 00
This email may be confidential or privileged. If you received this
communication by mistake, please
don't forward it to anyone else, please erase all copies and
attachments, and please let me know that
it went to the wrong person. Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/