Re: [PATCH 4/4] [x86] perf: fix accidentally ack'ing a secondevent on intel perf counter

From: Robert Richter
Date: Thu Sep 02 2010 - 09:16:33 EST


On 02.09.10 04:13:19, Stephane Eranian wrote:
> Robert,
>
> Do you have the test program you used to test this?
> I believe the NHM hack does not solve the problem, it
> just makes it harder to appear.

For testing back-to-back nmis I have used:

perf record -e cycles -e instructions -e cache-references
-e cache-misses -e branch-misses -a -- sleep 10

with load on all cpus. But I couldn't reproduce this particular
problem as I do not have such a system available. I think it might
trigger also with only one counter running. What the observed from the
status bits, only one counter was involved.

>
> I suspect the real issue is that the GLOBAL_STATUS
> bitmask cannot be trusted. I'd like to verify this.

So yes, it looks like it is a cpu bug with a race then clearing the
status. I didn't check the errata list, maybe it is already known.

>
> Has the problem appear only on Nehalem or also on
> Westmere?

I don't know.

-Robert

--
Advanced Micro Devices, Inc.
Operating System Research Center

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/