Re: [PATCH] perf: RISC-V: fix IRQ detection on T-Head C908

From: Conor Dooley
Date: Mon Mar 18 2024 - 19:48:42 EST


On Mon, Mar 18, 2024 at 03:46:54PM -0700, Atish Patra wrote:
> On 3/15/24 01:11, Andrew Jones wrote:
> > On Wed, Mar 13, 2024 at 09:31:26AM +0800, Inochi Amaoto wrote:
> > ...
> > > IMHO, it may be better to use a new DT property like "riscv,cpu-errata" or
> > > "<vendor>,cpu-errata". It can achieve almost everything like using pseudo
> > > isa. And the only cost I think is a small amount code to parse this.
> > >
> >
> > What's the ACPI equivalent for this new DT property? If there isn't one,
> > then the cost is also to introduce something to the ACPI spec and add the
> > ACPI parsing code.
> >
> > I'd much rather we call specified behaviors "extensions", whether they
> > are vendor-specific or RVI standard, and then treat all extensions the
> > same way in hardware descriptions and Linux. It'd also be best if errata
> > in extension implementations were handled by replacing the extension in
> > the hardware description with a new name which is specifically for the
> > behavior Linux should expect. (Just because two extensions are almost the
> > same doesn't mean we should say we have one and then have some second
> > mechanism to say, "well, not really, instead of that, it's this". It's
> > cleaner to just remove the extension it doesn't properly implement from
> > its hardware description and create a name for the behavior it does have.)
> >
> > Errata in behaviors which don't have extension names (are hopefully few)
> > and are where mvendorid and friends would need to be checked, but then why
> > not create a pseudo extension name, as Conor suggests, so the rest of
> > Linux code can manage errata the same way it manages every other behavior?
> >
> > The growth rate of the ISA bitmap is worth thinking about, though, since
> > we have several copies of it (at least one "all harts" bitmap, one bitmap
> > for each hart, another one for each vcpu, and then there's nested virt...)
> > We don't have enough extensions to worry about it now, but we can
> > eventually try partitioning, using common maps for common bits, not
> > storing bits which can be inferred from other bits, etc.
>
> This is my biggest worry going forward. We already have a ever growing
> standard RVI extension list. On top of that we have genuine vendor
> extensions. IMHO, errata are bit different than extensions as there will be
> few vendor extensions in the future but many hardware erratas :)

I dunno, I think there's going to be plenty of both. We may not see (or
use) a lot of vendor extensions in mainline Linux, but they will exist.

> If we start calling every hardware errata as an pseudo ISA extensions, we
> will much bigger problem maintaining it in the future.

I've explained to you at least once already that this is not my goal.
Where there are genuine issues with the implementation of an extension
creating a "pseudo" extension is not what I am suggesting we do.
I have no problem with with the approach taken for the SiFive errata,
for example.

> We discussed this earlier during the Andes PMU extension series[1] as well.
> We have three types of extensions in discussions now.
>
> 1. standard RVI extensions
> 2. Vendor extensions
> a. Genuine vendor extension
> b. Vendor erratas which can be described as pseudo-extensions now

> Keeping all these within a single ISA bitmap space seems very odd to me.
> I think the feasible approach would be to partition the standard and vendor
> ISA extension space as you suggested.

Let's be clear - partitioning the space is unrelated to the detection
method. We can go ahead and partition the space and use "pseudo"
extensions or we can have a unified space but use archid/impid for
detection. Having a unified space is the simpler thing to implement
right now, but it totally does not stop us breaking them out in the
future. We could even gate these custom implementations behind config
options if bloat is a concern - but multiplatform kernels are likely to
enable all the options anyway.

> For 2.b, either we can start defining pseudo extensions or adding
> vendor/arch/impid checks.
>
> @Conor: You seems to prefer the earlier approach instead of adding the
> checks. Care to elaborate why do you think that's a better method compared
> to a simple check ?

Because I don't think that describing these as "errata" in the first
place is even accurate. This is not a case of a vendor claiming they
have Sscofpmf support but the implementation is flawed. As far as I
understand, this is a vendor creating a useful feature prior to the
creation of a standard extension.
A bit of a test for this could be "If the standard extension never
existed, would this be considered a new feature or an implementation
issue". I think this is pretty clearly in the former camp.

I do not think we should be using m*id detection implementations of a
feature prior to creation of a standard extension for the same purpose.
To me the main difference between a case like this and VentanaCondOps/Zicond
is that we are the ones calling this an extension (hence my use of pseudo)
and not the vendor of the IP. If T-Head were to publish a document tomorrow
on the T-Head github repo for official vendor extensions, that difference
would not even exist any longer.

I also do not believe that it is a "simple" check. The number of
implementations that could end up using this PMU could just balloon
if T-Head has no intention of switching to Sscofpmf. If they don't
balloon in this case, there's nothing stopping them ballooning in a
similar case in the future. We should let the platform firmware tell us
explicitly, be that via DT or ACPI, what features are supported rather
than try to reverse engineer it ourselves via m*id.

That leads into another general issue I have with using m*id detection,
which I think I have mentioned several times on the list - it prevents the
platform (hypervisor, emulator or firmware) from disabling that feature.

If I had a time machine back to when the T-Head perf or cmo stuff was
submitted, I was try to avoid any of it being merged with the m*id
detection method.

> I agree that don't have the crystal ball and may be proven wrong in the
> future (I will be definitely happy about that!). But given the diversity of
> RISC-V ecosystem, I feel that may be our sad reality.

I don't understand what this comment is referring to, it lacks context
as to what the sad reality actually is.

I hope that all made sense and explained why I am against this method
for detecting what I believe to be features rather than errata,
Conor.

Attachment: signature.asc
Description: PGP signature