Re: [PATCH v1 4/6] perf: Allow using AUX data in perf samples

From: Alexander Shishkin
Date: Tue Jun 19 2018 - 06:47:35 EST


On Thu, Jun 14, 2018 at 10:20:22PM +0200, Peter Zijlstra wrote:
> On Tue, Jun 12, 2018 at 10:51:15AM +0300, Alexander Shishkin wrote:
> > +static unsigned long perf_aux_sample_size(struct perf_event *event,
> > + struct perf_sample_data *data,
> > + size_t size)
> > +{
> > + struct perf_event *sampler = event->sample_event;
> > + struct ring_buffer *rb;
> > + int *disable_count;
> > +
> > + data->aux.size = 0;
> > +
> > + if (!sampler || READ_ONCE(sampler->state) != PERF_EVENT_STATE_ACTIVE)
> > + goto out;
> > +
> > + if (READ_ONCE(sampler->oncpu) != smp_processor_id())
> > + goto out;
>
> Should those not be WARNs ? If we allow a configuration where that is
> possible, we're doing it wrong I think.

Indeed, this shouldn't happen.

> > + /*
> > + * Non-zero disable count here means that we, being the NMI
> > + * context, are racing with pmu::add, pmu::del or address filter
> > + * adjustment, which we want to avoid.
> > + */
> > + disable_count = this_cpu_ptr(sampler->pmu->pmu_disable_count);
> > + if (*disable_count)
> > + goto out;
> > +
> > + /* Re-enabled in perf_aux_sample_output() */
> > + perf_pmu_disable(sampler->pmu);
>
> This is disguisting.. and also broken I think. Imagine what happens
> when the NMI hits in middle of perf_pmu_enable(), right where count
> dropped to 0, but we've not yet done pmu->pmu_enable() yet.
>
> Then we end up with a double disable and double enable.

True, we kind of tiptoe around that by not having ->pmu_enable() and
->pmu_disable() for PT.

> > +
> > + rb = ring_buffer_get(sampler);
> > + if (!rb) {
> > + perf_pmu_enable(sampler->pmu);
> > + goto out;
> > + }
> > +
> > + /* Restarted in perf_aux_sample_output() */
> > + sampler->pmu->stop(sampler, PERF_EF_UPDATE);
>
> More yuck...
>
> You rreally should not be calling these pmu::methods, they're meant to
> be used from _interrupt_ not NMI context. Using them like this is asking
> for tons of trouble.

Right, the SW stuff may then race with event_function_call() stuff. Hmm.
For the HW stuff, I'm hoping that some kind of a sleight of hand may
suffice. Let me think some more.

> Why can't you just snapshot the current location and let the thing
> 'run' ?

Because the buffer will overwrite itself and the location will be useless.
We don't write the AUX data out in this 'mode' at all, only the samples,
which allows for much less data in the resulting perf.data, less work for
the consumer, less IO bandwidth etc, and as a bonus, no AUX-related
interrupts.

But actually, even to snapshot the location we need to stop the event.

> > + data->aux.to = rb->aux_head;
> > +
> > + size = min(size, perf_aux_size(rb));
> > +
> > + if (data->aux.to < size)
> > + data->aux.from = rb->aux_nr_pages * PAGE_SIZE + data->aux.to -
> > + size;
> > + else
> > + data->aux.from = data->aux.to - size;
> > + data->aux.size = ALIGN(size, sizeof(u64));
> > + ring_buffer_put(rb);
> > +
> > +out:
> > + return data->aux.size;
> > +}
> > +
> > +static void perf_aux_sample_output(struct perf_event *event,
> > + struct perf_output_handle *handle,
> > + struct perf_sample_data *data)
> > +{
>
> > + ring_buffer_put(rb);
> > + sampler->pmu->start(sampler, 0);
> > +
> > +out_enable:
> > + perf_pmu_enable(sampler->pmu);
>
> More of that... really yuck stuff.
>
> > +}