Re: [PATCH v2 12/12] KVM: x86/pmu: Clear reserved bit PERF_CTL2[43] for AMD erratum 1292

From: Jim Mattson
Date: Fri Mar 04 2022 - 14:06:28 EST


On Fri, Mar 4, 2022 at 1:47 AM Like Xu <like.xu.linux@xxxxxxxxx> wrote:
>
> On 3/3/2022 1:52 am, Jim Mattson wrote:
> > On Wed, Mar 2, 2022 at 3:14 AM Like Xu <like.xu.linux@xxxxxxxxx> wrote:
> >>
> >> From: Like Xu <likexu@xxxxxxxxxxx>
> >>
> >> The AMD Family 19h Models 00h-0Fh Processors may experience sampling
> >> inaccuracies that cause the following performance counters to overcount
> >> retire-based events. To count the non-FP affected PMC events correctly,
> >> a patched guest with a target vCPU model would:
> >>
> >> - Use Core::X86::Msr::PERF_CTL2 to count the events, and
> >> - Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
> >> - Program Core::X86::Msr::PERF_CTL2[20] to 0b.
> >>
> >> To support this use of AMD guests, KVM should not reserve bit 43
> >> only for counter #2. Treatment of other cases remains unchanged.
> >>
> >> AMD hardware team clarified that the conditions under which the
> >> overcounting can happen, is quite rare. This change may make those
> >> PMU driver developers who have read errata #1292 less disappointed.
> >>
> >> Reported-by: Jim Mattson <jmattson@xxxxxxxxxx>
> >> Signed-off-by: Like Xu <likexu@xxxxxxxxxxx>
> >
> > This seems unnecessarily convoluted. As I've said previously, KVM
> > should not ever synthesize a #GP for any value written to a
> > PerfEvtSeln MSR when emulating an AMD CPU.
>
> IMO, we should "never synthesize a #GP" for all AMD MSRs,
> not just for AMD PMU msrs, or keep the status quo.

Then, why are you proposing this change? :-)

We should continue to synthesize a #GP for an attempt to set "must be
zero" bits or for rule violations, like "address must be canonical."
However, we have absolutely no business making up our own hardware
specification. This is a bug, and it should be fixed, like any other
bug.

> I agree with you on this AMD #GP transition, but we need at least one
> kernel cycle to make a more radical change and we don't know Paolo's
> attitude and more, we haven't received a tidal wave of user complaints.

Again, if this is your stance, why are you proposing this change? :-)

If you wait until you have a tidal wave of user complaints, you have
waited too long. It's much better to be proactive than reactive.