Re: [PATCH v3 2/5] arm64: cpufeature: Add feature to detect heterogeneous systems

From: Mark Rutland
Date: Tue Aug 20 2019 - 12:04:03 EST


On Tue, Aug 20, 2019 at 04:55:24PM +0100, Raphael Gault wrote:
> Hi Mark,
>
> Thank you for your comments.
>
> On 8/20/19 4:49 PM, Mark Rutland wrote:
> > On Tue, Aug 20, 2019 at 04:23:17PM +0100, Mark Rutland wrote:
> > > Hi Raphael,
> > >
> > > On Fri, Aug 16, 2019 at 01:59:31PM +0100, Raphael Gault wrote:
> > > > This feature is required in order to enable PMU counters direct
> > > > access from userspace only when the system is homogeneous.
> > > > This feature checks the model of each CPU brought online and compares it
> > > > to the boot CPU. If it differs then it is heterogeneous.
> > >
> > > It would be worth noting that this patch prevents heterogeneous CPUs
> > > being brought online late if the system was uniform at boot time.
> >
> > Looking again, I think I'd misunderstood how
> > ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU was dealt with, but we do have a
> > problem in this area.
> >
> > [...]
> >
> > >
> > > > + .capability = ARM64_HAS_HETEROGENEOUS_PMU,
> > > > + .type = ARM64_CPUCAP_SCOPE_LOCAL_CPU | ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU,
> > > > + .matches = has_heterogeneous_pmu,
> > > > + },
> >
> > I had a quick chat with Will, and we concluded that we must permit late
> > onlining of heterogeneous CPUs here as people are likely to rely on
> > late CPU onlining on some heterogeneous systems.
> >
> > I think the above permits that, but that also means that we need some
> > support code to fail gracefully in that case (e.g. without sending
> > a SIGILL to unaware userspace code).
>
> I understand, however, I understood that ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU
> did not allow later CPU to be heterogeneous if the capability wasn't already
> enabled.

Yes, I think that you're right. IIUC the absence of
ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU is what prevents that from
happening.

> Thus if as you say we need to allow the system to switch from
> homogeneous to heterogeneous, then I should change the type of this
> capability.

I'm afraid so!

I believe we need both ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU and
ARM64_CPUCAP_PERMITTED_FOR_LATE_CPU, so I guess we should be using
ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE.

Does that sound right to you? ... or have I confused myself again?

Thanks,
Mark.

> > That means that we'll need the counter emulation code that you had in
> > previous versions of this patch (e.g. to handle potential UNDEFs when a
> > new CPU has fewer counters than the previously online CPUs).
> >
> > Further, I think the context switch (and event index) code needs to take
> > this cap into account, and disable direct access once the system becomes
> > heterogeneous.
>
> That is a good point indeed.
>
> Thanks,
>
> --
> Raphael Gault