RE: [PATCH 1/1] perf/core: find auxiliary events in running pmus list

From: Liang, Kan
Date: Sun Feb 28 2016 - 11:31:57 EST




> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz@xxxxxxxxxxxxx]
> Sent: Friday, February 26, 2016 5:37 AM
> To: Liang, Kan
> Cc: linux-kernel@xxxxxxxxxxxxxxx; ak@xxxxxxxxxxxxxxx;
> eranian@xxxxxxxxxx; vincent.weaver@xxxxxxxxx; tglx@xxxxxxxxxxxxx;
> mingo@xxxxxxxxxx; acme@xxxxxxxxxx; jolsa@xxxxxxxxxx;
> ying.huang@xxxxxxxxxxxxxxx; alexander.shishkin@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH 1/1] perf/core: find auxiliary events in running pmus
> list
>
> > index 9> +static void account_running_pmu(struct perf_event *event)
> > +{
> > + struct running_pmu *pmu;
> > +
> > + mutex_lock(&running_pmus_lock);
> > +
> > + list_for_each_entry(pmu, &running_pmus, entry) {
> > + if (pmu->pmu == event->pmu) {
> > + pmu->nr_event++;
> > + goto out;
> > + }
> > + }
> > +
> > + pmu = kzalloc(sizeof(struct running_pmu), GFP_KERNEL);
> > + if (pmu != NULL) {
> > + pmu->nr_event++;
> > + pmu->pmu = event->pmu;
> > + list_add_rcu(&pmu->entry, &running_pmus);
> > + }
>
> That kzalloc() doesn't make any sense, why not simply add a member to
> struct pmu?
>
> > +out:
> > + mutex_unlock(&running_pmus_lock);
> > +}
>
> In any case, can't we replace the whole perf_event_aux muck with a data
> structure for finding interested events?
>
> Because not only is iterating all PMUs silly, we then further iterate all
> events in whatever contexts we find from them. Even if none of the
> events in these contexts is interested in the side-band event.
>
> We have:
> mmap,comm,task,mmap_data,mmap2,comm_exec,context_switc
> h
>
>
> Which I think we can reduce like:
>
> {mmap,mmap_data,mmap2} -> mmap
> {comm,comm_exec} -> comm
>
> mmap,comm,task,context_switch
>
> This would allow us to keep 4 per-cpu lists of events like:
>
>
> struct pmu_event_list {
> raw_spinlock_t lock;
> struct list_head list;
> };
>
> enum event_sb_channel {
> sb_mmap = 0,
> sb_comm,
> sb_task,
> sb_switch,
> sb_nr,
> }
>
> static DEFINE_PER_CPU(struct pmu_event_list, pmu_sb_events[sb_nr]);
>
> static void attach_sb_event(struct perf_event *event, enum
> event_sb_channel sb) {
> struct pmu_event_list *pel = per_cpu_ptr(&pmu_sb_events[sb],
> event->cpu);
>
> raw_spin_lock(&pel->lock);
> list_add_rcu(&event->sb_list[sb], &pel->list);
> raw_spin_unlock(&pel->lock);
> }
>
> static void account_pmu_sb_event(struct perf_event *event) {
> if (event->parent)
> return;
>
> if (event->attach & ATTACH_TASK)
> return;
>
> if (event->attr.mmap || event->attr.mmap_data || event-
> >attr.mmap2)
> attach_sb_event(event, sb_mmap);
>
> if (event->attr.comm || event->attr.comm_exec)
> attach_sb_event(event, sb_comm);
>
> if (event->attr.task)
> attach_sb_event(event, sb_task);
>
> if (event->attr.context_switch)
> attach_sb_event(event, sb_switch);
> }
>
> /* matching unaccount */
>
>
> static void perf_event_sb_iterate(enum event_sb_channel sb,
> perf_event_sb_output_f output, void
> *data) {
> struct pmu_event_list *pel =
> __this_cpu_ptr(&pmu_sb_events[sb]);
> struct perf_event *event;
>
> list_for_each_entry_rcu(event, &pel->list, sb_list[sb]) {
> if (event->state < PERF_EVENT_STATE_INACTIVE)
> continue;
> if (!event_filter_match(event))
> continue;
> output(event, data);
> }
> }
>
> static void perf_event_sb_mask(unsigned int sb_mask,
> perf_event_sb_output_f output, void *data) {
> int sb;
>
> for (sb = 0; sb < sb_nr; sb++) {
> if (!(sb_mask & (1 << sb)))
> continue;
> perf_event_sb_iterate(sb, output, data);
> }
> }
>
> Note: the mask is needed because a task event (as per perf_event_task)
> needs to go out to sb_comm, sb_mmap and sb_task, see
> perf_event_task_match().
>
> And then you can replace the whole global part of perf_event_aux (which I
> would rename to perf_event_sb), with this.
>
> You still have to do something like:
>
> for_each_task_context_nr(ctxn) {
> ctx = rcu_dereference(current->perf_event_ctxp[ctxn]);
> if (!ctx)
> continue;
> perf_event_sb_ctx(ctx, output, data);
> }
>

I'm not sure why we need something like for_each_task_context_nr(ctxn) in
perf_event_aux/perf_event_sb.
Isn't perf_event_sb_mask enough?
It looks perf_event_sb_mask will go through all possible interested events
(includes both per cpu and per task events). That's what we want to do in
perf_event_aux, right?

Thanks,
Kan

> To get at the per task events, because I don't think we want to go update
> more global state on context switch, nor am I entirely sure its worth it to
> keep per sb ctx->event_list[]s.
>
>