Re: [PATCH 5/5] perf, x86: Prefer RDPMC over RDMSR for reading counters

From: Stephane Eranian
Date: Wed Jun 06 2012 - 10:33:50 EST


On Wed, Jun 6, 2012 at 4:21 PM, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:
> On Wed, 2012-06-06 at 07:16 -0700, Andi Kleen wrote:
>> On Wed, Jun 06, 2012 at 12:46:19PM +0200, Peter Zijlstra wrote:
>> > On Tue, 2012-06-05 at 17:56 -0700, Andi Kleen wrote:
>> > > From: Andi Kleen <ak@xxxxxxxxxxxxxxx>
>> > >
>> > > RDPMC is much faster than RDMSR for reading performance counters,
>> > > since it's not serializing. ÂUse it if possible in the perf handler.
>> > >
>> > > Only tested on Sandy Bridge, so I only enabled it there so far.
>> >
>> > That's just stupid.. I took Vince's patch from a while back.
>>
>> What do you mean? It's significantly faster to read the counters this
>> way, because it avoids serialization and other overhead.
>
> What I'm saying is you only enabling it for snb and being too lazy to
> test anything else. Nor do I think its worth the conditional, all chips
> we have PMU support for have rdpmc instructions.
>
>> Vince's patch only enabled it for user space I believe, This is for lowering
>> the kernel PMI handler overhead.
>
> No, his patch did the kernel thing. Furthermore he actually tested it on
> a bunch of machines.

Yes, his patch did but somehow I don't see this code in tip-x86.
The thing that I would worry about between rdmsrl() and rdpmc()
is what happens to the upper bits. rdpmc() returns bits [N-1:0] of
the N-bit counters. N is 48 (or 40) nowadays. When you read 64 bit
worth, what do you get in bits [63:N]? are those sign-extended or
zero-extended. Is that the same behavior across all Intel and AMD
processors? With perf_events, I think the (N-1)th bit is always set.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/