Re: PEBS bug on HSW: "Unexpected number of pebs records 10" (was:Re: [GIT PULL] perf changes for v3.12)

From: Peter Zijlstra
Date: Mon Sep 23 2013 - 13:25:33 EST


On Mon, Sep 23, 2013 at 07:11:21PM +0200, Stephane Eranian wrote:
> Ok so what you are saying is that the ovfl_status is not maintained private
> to each counter but shared among all PEBS counters by ucode. That's
> how you end up leaking between counters like that.

I only remember asking for clarification because the SDM isn't clear on
this subject; the answer was that it simply copies whatever is in
MSR_CORE_PERF_GLOBAL_STATUS.

I explained how this would be a problem and it was agreed this needed
fixing -- not sure if that ever happened.

> But the other thing I remember is that if two PEBS events overflow
> at the same time, PEBS only write one record with 2 bits set in the
> ovfl_status field. No point in creating two because the machine state
> will be the same for both. The kernel would just need to dispatch the
> same PEBS record to all the events that overflowed.

Hurm.. that makes life more interesting still. The current code only
delivers the event to the first bit set. Changing this would be simple
though.

> Now, your case appears like that, except this is not what happened.
> So you're misled to believe both counter overflowed at the same time
> when they did not in reality.
>
> I'd like to have a test case where I could reproduce this.

Agreed, I've never tried to actually reproduce this. I suppose it would
be easiest to trigger where the one event is very rare and controlled.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/