Re: [PATCH 2/2] edac: add support for Amazon's Annapurna Labs EDAC

From: Benjamin Herrenschmidt
Date: Fri Jun 07 2019 - 20:21:09 EST


On Thu, 2019-06-06 at 11:33 +0100, James Morse wrote:

> > Disagree. The various drivers don't depend on each other.
> > I think we should keep the drivers separated as they are distinct and independent IP blocks.
>
> But they don't exist in isolation, they both depend on the integration-choices/firmware
> that makes up your platform.

What do you mean ? They are exposing counters from independent IP
blocks. They are independent drivers. You argument could be use to
claim the entire SoC depends on "integration choices / firmware" ... I
don't get it.

> Other platforms may have exactly the same IP blocks, configured differently, or with
> different features enabled in firmware.

Sure, like every other IP block on the planet. That has never been a
good reason to bring back the ugly spectre of myboard.c file...

> This means we can't just probe the driver based on
> the presence of the IP block, we need to know the integration choices and firmware
> settings match what the driver requires.

Such as ? I mean none of that differs between these EDAC drivers and
any other IP block, and we still probe them individually.

> (Case in point, that A57 ECC support is optional, another A57 may not have it)

So what ? That belongs in the DT.

> Descriptions of what firmware did don't really belong in the DT. Its not a hardware property.

Since when ? I'm tired of people coming up over and over about that
complete fallacy that the DT should for some obscure religious reason
be strictly limited to "HW properties". ACPI isn't. The old Open
Firmware which I used as a basis for creating the FDT wasn't.

It is perfectly legitimate for the DT to contain configuration
information and firmware choices.

What's not OK is to stick there things that are essentially specific to
the Linux driver implementation but that isn't what we are talking
about here.

> This is why its better to probe this stuff based on the machine-compatible/platform-name,
> not the presence of the IP block in the DT.

No. No no no no. This is bringing back the days of having board files
etc... this is wrong.

Those IP blocks don't need any SW coordination at runtime. The drivers
don't share data nor communicate with each other. There is absolultely
no reason to go down that path.

> Will either of your separate drivers ever run alone? If they're probed from the same
> machine-compatible this won't happen.

They should be probed independently from independent DT nodes, what's
the problem you are trying to fix here ?

> How does your memory controller report errors? Does it send back some data with an invalid
> checksum, or a specific poison/invalid flag? Will the cache report this as a cache error
> too, if its an extra signal, does the cache know what it is?

That's ok, you get the error from both sides, power has done it that
way for ever. It's not always possible to correlate anyways and it's
certainly not the job of the EDAC drivers to try.

> All these are integration choices between the two IP blocks, done as separate drivers we
> don't have anywhere to store that information.

We do, it's called the DT.

> Even if you don't care about this, making
> them separate drivers should only be done to make them usable on other platforms, where
> these choices may have been different.

That wouldn't make the drivers unusable on other platforms at all.

Cheers,
Ben.