Re: Relax CPU features sanity checking on heterogeneous architectures

From: Mark Rutland
Date: Fri Oct 11 2019 - 09:54:37 EST


On Fri, Oct 11, 2019 at 02:33:43PM +0100, Marc Zyngier wrote:
> On Fri, 11 Oct 2019 11:50:11 +0100
> Mark Rutland <mark.rutland@xxxxxxx> wrote:
>
> > Hi,
> >
> > On Fri, Oct 11, 2019 at 11:19:00AM +0530, Sai Prakash Ranjan wrote:
> > > On latest QCOM SoCs like SM8150 and SC7180 with big.LITTLE arch, below
> > > warnings are observed during bootup of big cpu cores.
> >
> > For reference, which CPUs are in those SoCs?
> >
> > > SM8150:
> > >
> > > [ 0.271177] CPU features: SANITY CHECK: Unexpected variation in
> > > SYS_ID_AA64PFR0_EL1. Boot CPU: 0x00000011112222, CPU4: 0x00000011111112
> >
> > The differing fields are EL3, EL2, and EL1: the boot CPU supports
> > AArch64 and AArch32 at those exception levels, while the secondary only
> > supports AArch64.
> >
> > Do we handle this variation in KVM?
>
> We do, at least at vcpu creation time (see kvm_reset_vcpu). But if one
> of the !AArch32 CPU comes in late in the game (after we've started a
> guest), all bets are off (we'll schedule the 32bit guest on that CPU,
> enter the guest, immediately take an Illegal Exception Return, and
> return to userspace with KVM_EXIT_FAIL_ENTRY).

Ouch. We certainly can't remove the warning untill we deal with that
somehow, then.

> Not sure we could do better, given the HW. My preference would be to
> fail these CPUs if they aren't present at boot time.

I agree; I think we need logic to check the ID register fields against
their EXACT, {LOWER,HIGHER}_SAFE, etc rules regardless of whether we
have an associated cap. That can then abort a late onlining of a CPU
which violates those rules w.r.t. the finalised system value.

I suspect that we may want to split the notion of
safe-for-{user,kernel-guest} in the feature tables, as if nothing else
it will force us to consider those cases separately when adding new
stuff.

Thanks,
Mark.