Re: [PATCH] ARM: mvebu: Enable MBUS error propagation

From: Chris Packham
Date: Mon Apr 25 2022 - 21:17:29 EST



On 22/04/22 21:27, Pali Rohár wrote:
> On Thursday 03 June 2021 21:03:55 Chris Packham wrote:
>> On 4/06/21 12:55 am, Pali Rohár wrote:
>>> On Wednesday 08 January 2020 19:42:12 Chris Packham wrote:
>>>> Hi Gregory,
>>>>
>>>> On Wed, 2020-01-08 at 11:22 +0100, Gregory CLEMENT wrote:
>>>>> Hello Chris,
>>>>>
>>>>>> U-boot disables MBUS error propagation for Armada-385. The effect of
>>>>>> this on the kernel is that any access to a mapped but inaccessible
>>>>>> address causes the system to hang.
>>>>>>
>>>>>> By enabling MBUS error propagation the kernel can raise a Bus Error and
>>>>>> panic to restart the system.
>>>>> Unless I miss it, it seems that nobody comment this patch: sorry for the
>>>>> delay.
>>>>>
>>>> Thanks for the response.
>>>>
>>>>>> Signed-off-by: Chris Packham <chris.packham@xxxxxxxxxxxxxxxxxxx>
>>>>>> ---
>>>>>>
>>>>>> Notes:
>>>>>> We've encountered an issue where rogue accesses to PCI-e space cause an
>>>>>> Armada-385 system to lockup. We've found that enabling MBUS error
>>>>>> propagation lets us get a bus error which at least gives us a panic to
>>>>>> help identify what was accessed.
>>>>>>
>>>>>> U-boot clears the IO Err Prop Enable Bit[1] but so far no-one seems to
>>>>>> know why.
>>>>>>
>>>>>> I wasn't sure where to put this code. There is similar code for kirwood
>>>>>> in the equivalent dt_init function. On Armada-XP the register is part of
>>>>>> the Core Coherency Fabric block (for A385 it's documented as part of the
>>>>>> CCF block).
>>>>> What about adding a new set of register to the mvebu mbus driver?
>>>>>
>>>> After more testing we found that some previously "good" boards started
>>>> throwing up panics with this change. I think that this might require
>>>> handling some of the PCI-e interrupts (for correctable errors) via the
>>>> EDAC subsystem.
>>>>
>>>> We're still working with Marvell to track down exactly why this is
>>>> happening on our system.
>>> Hello Chris! Have you somehow solved this issue? Or do you have some
>>> contacts in Marvell for A385 PCIe subsystem?
>> The problem seemed to be a specific CPU Erratum for the A385 which I
>> brought into upstream u-boot[1].
> Hello Chris, could you point me to this CPU Erratum? What is its number
> or some other identifier? Because I'm unable to find any such erratum in
> the list for A385 erratums.

I haven't seen the actual errata either. My comment is based on the
following change from one of Marvell's u-boot forks

https://github.com/MarvellEmbeddedProcessors/u-boot-marvell/commit/20cd2704072512de176e048970f2883db901674b

It just says "ERRATA# TBD". Perhaps they just haven't published a new
document (latest I can find is Rev D from 2016) or they decided it was a
SW bug and didn't assign it an official errata number.

>> But even then we had to update our base
>> u-boot version from 2016.11 to 2019.10 so there may have been more going
>> on than just that one change.
>>
>> [1] -
>> https://scanmail.trustwave.com/?c=20988&d=8fTi4h2MjPVKSgkegODmtLtuBcIeiZ2hXGl9DulH7A&u=https%3a%2f%2fsource%2edenx%2ede%2fu-boot%2fu-boot%2f-%2fcommit%2fad91fdfff0bd6ea471afe838e0f6d58ed898694e
>>
>>>>> In this case it will be called even earlier allowing to see bus error
>>>>> earlier.
>>>>>
>>>>> In any case, you should separate the device tree change from the code
>>>>> change and at least provide 2 patches.
>>>> Agreed. If/when something solid eventuates we'll do it as a proper
>>>> series.
>>>>
>>>>> Gregory
>>>>>
>>>>>>
>>>>>> --
>>>>>> [1] - https://scanmail.trustwave.com/?c=20988&d=8fTi4h2MjPVKSgkegODmtLtuBcIeiZ2hXGl-X75J5A&u=https%3a%2f%2fgitlab%2edenx%2ede%2fu-boot%2fu-boot%2fblob%2fmaster%2farch%2farm%2fmach-mvebu%2fcpu%2ec%23L489
>> ^^^ sorry, as a result of a reasonably high profile cyber attack on a
>> local institution URLs in our incoming mail are intercepted.
>>>>>> arch/arm/boot/dts/armada-38x.dtsi | 5 +++++
>>>>>> arch/arm/mach-mvebu/board-v7.c | 27 +++++++++++++++++++++++++++
>>>>>> 2 files changed, 32 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/arm/boot/dts/armada-38x.dtsi b/arch/arm/boot/dts/armada-38x.dtsi
>>>>>> index 3f4bb44d85f0..3214c67433eb 100644
>>>>>> --- a/arch/arm/boot/dts/armada-38x.dtsi
>>>>>> +++ b/arch/arm/boot/dts/armada-38x.dtsi
>>>>>> @@ -386,6 +386,11 @@
>>>>>> <0x20250 0x8>;
>>>>>> };
>>>>>>
>>>>>> + ioerrc: io-err-control@20200 {
>>>>>> + compatible = "marvell,io-err-control";
>>>>>> + reg = <0x20200 0x4>;
>>>>>> + };
>>>>>> +
>>>>>> mpic: interrupt-controller@20a00 {
>>>>>> compatible = "marvell,mpic";
>>>>>> reg = <0x20a00 0x2d0>, <0x21070 0x58>;
>>>>>> diff --git a/arch/arm/mach-mvebu/board-v7.c b/arch/arm/mach-mvebu/board-v7.c
>>>>>> index d2df5ef9382b..fb7718386ef9 100644
>>>>>> --- a/arch/arm/mach-mvebu/board-v7.c
>>>>>> +++ b/arch/arm/mach-mvebu/board-v7.c
>>>>>> @@ -138,10 +138,36 @@ static void __init i2c_quirk(void)
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> +#define MBUS_ERR_PROP_EN BIT(8)
>>>>>> +
>>>>>> +/*
>>>>>> + * U-boot disables MBUS error propagation. Re-enable it so we
>>>>>> + * can handle them as Bus Errors.
>>>>>> + */
>>>>>> +static void __init enable_mbus_error_propagation(void)
>>>>>> +{
>>>>>> + struct device_node *np =
>>>>>> + of_find_compatible_node(NULL, NULL, "marvell,io-err-control");
>>>>>> +
>>>>>> + if (np) {
>>>>>> + void __iomem *reg;
>>>>>> +
>>>>>> + reg = of_iomap(np, 0);
>>>>>> + if (reg) {
>>>>>> + u32 val;
>>>>>> +
>>>>>> + val = readl_relaxed(reg);
>>>>>> + writel_relaxed(val | MBUS_ERR_PROP_EN, reg);
>>>>>> + }
>>>>>> + of_node_put(np);
>>>>>> + }
>>>>>> +}
>>>>>> +
>>>>>> static void __init mvebu_dt_init(void)
>>>>>> {
>>>>>> if (of_machine_is_compatible("marvell,armadaxp"))
>>>>>> i2c_quirk();
>>>>>> + enable_mbus_error_propagation();
>>>>>> }
>>>>>>
>>>>>> static void __init armada_370_xp_dt_fixup(void)
>>>>>> @@ -191,6 +217,7 @@ DT_MACHINE_START(ARMADA_38X_DT, "Marvell Armada 380/385 (Device Tree)")
>>>>>> .l2c_aux_val = 0,
>>>>>> .l2c_aux_mask = ~0,
>>>>>> .init_irq = mvebu_init_irq,
>>>>>> + .init_machine = mvebu_dt_init,
>>>>>> .restart = mvebu_restart,
>>>>>> .dt_compat = armada_38x_dt_compat,
>>>>>> MACHINE_END
>>>>>> --
>>>>>> 2.24.0
>>>>>>