Re: [tip:perf/core] perf: Ignore non-sampling overflows

From: Peter Zijlstra
Date: Tue Jun 28 2011 - 07:09:02 EST


On Tue, 2011-06-28 at 12:53 +0200, Robert Richter wrote:
> > --- a/kernel/perf_event.c
> > +++ b/kernel/perf_event.c
> > @@ -4240,6 +4240,13 @@ static int __perf_event_overflow(struct perf_event *event, int nmi,
> > struct hw_perf_event *hwc = &event->hw;
> > int ret = 0;
> >
> > + /*
> > + * Non-sampling counters might still use the PMI to fold short
> > + * hardware counters, ignore those.
> > + */
> > + if (unlikely(!is_sampling_event(event)))
> > + return 0;
> > +

> do you remember the background of this change. This check silently
> drops data of non-sampling events. I want to use perf_event_overflow()
> to write to the buffer and want to modify the check, but don't see
> which 'accidentally' interrupts may occur that must be ignored.

IIRC this is because we always program the interrupt bit, such that when
the counter overflows we can account and reprogram the thing. This is
needed because no hardware counter is in fact 64 bits wide. Therefore we
have to program the counter to its max width and properly account the
state and reprogram on overflow.

Imagine a 32bit cycle counter (@1GHz), if we were not to program that as
taking interrupts and nobody would read that counter for about 4.2
seconds, we'd have overflowed and lost the actual count value for the
thing.

So what we do is program is at 31bits (so that the msb can toggle and
trigger the interrupt), and on interrupt add to event->count, and reset
the hardware to start counting again.

Now some arch/*/perf_event.c implementations unconditionally called
perf_event_overflow() from their IRQ handler, even for such non-sampling
counters.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/