Re: [PATCH v2] irqchip/bcm2835: Quiesce IRQs left enabled by bootloader

From: Florian Fainelli
Date: Tue Feb 11 2020 - 23:50:15 EST




On 2/10/2020 1:52 AM, Lukas Wunner wrote:
> Customers of our "Revolution Pi" open source PLCs (which are based on
> the Raspberry Pi) have reported random lockups as well as jittery eMMC,
> UART and SPI latency. We were able to reproduce the lockups in our lab
> and hooked up a JTAG debugger:
>
> It turns out that the USB controller's interrupt is already enabled when
> the kernel boots. All interrupts are disabled when the chip comes out
> of power-on reset, according to the spec. So apparently the bootloader
> enables the interrupt but neglects to disable it before handing over
> control to the kernel.
>
> The bootloader is a closed source blob provided by the Raspberry Pi
> Foundation. Development of an alternative open source bootloader was
> begun by Kristina Brooks but it's not fully functional yet. Usage of
> the blob is thus without alternative for the time being.
>
> The Raspberry Pi Foundation's downstream kernel has a performance-
> optimized USB driver (which we use on our Revolution Pi products).
> The driver takes advantage of the FIQ fast interrupt. Because the
> regular USB interrupt was left enabled by the bootloader, both the
> FIQ and the normal interrupt is enabled once the USB driver probes.
>
> The spec has the following to say on simultaneously enabling the FIQ
> and the normal interrupt of a peripheral:
>
> "One interrupt source can be selected to be connected to the ARM FIQ
> input. An interrupt which is selected as FIQ should have its normal
> interrupt enable bit cleared. Otherwise a normal and an FIQ interrupt
> will be fired at the same time. Not a good idea!"
> ^^^^^^^^^^^^^^^
> https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
> page 110
>
> On a multicore Raspberry Pi, the Foundation's kernel routes all normal
> interrupts to CPU 0 and the FIQ to CPU 1. Because both the FIQ and the
> normal interrupt is enabled, a USB interrupt causes CPU 0 to spin in
> bcm2836_chained_handle_irq() until the FIQ on CPU 1 has cleared it.
> Interrupts with a lower priority than USB are starved as long.
>
> That explains the jittery eMMC, UART and SPI latency: On one occasion
> I've seen CPU 0 blocked for no less than 2.9 msec. Basically,
> everything not USB takes a performance hit: Whereas eMMC throughput
> on a Compute Module 3 remains relatively constant at 23.5 MB/s with
> this commit, it irregularly dips to 23.0 MB/s without this commit.
>
> The lockups occur when CPU 0 receives a USB interrupt while holding a
> lock which CPU 1 is trying to acquire while the FIQ is temporarily
> disabled on CPU 1.
>
> I've tested old releases of the Foundation's bootloader as far back as
> 1.20160202-1 and they all leave the USB interrupt enabled. Still older
> releases fail to boot a contemporary kernel on a Compute Module 1 or 3,
> which are the only Raspberry Pi variants I have at my disposal for
> testing.
>
> Fix by disabling IRQs left enabled by the bootloader. Although the
> impact is most pronounced on the Foundation's downstream kernel,
> it seems prudent to apply the fix to the upstream kernel to guard
> against such mistakes in any present and future bootloader.
>
> Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx>
> Cc: Serge Schneider <serge@xxxxxxxxxxxxxxx>
> Cc: Kristina Brooks <notstina@xxxxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx

It would be nice to provide a Fixes: tag so it gets backported to the
relevant -stable trees, this may be dating back to the first time the
driver was brought in tree. The commit message is a bit long and starts
going into details that I am not sure add anything, but FWIW:

Reviewed-by: Florian Fainelli <f.fainelli@xxxxxxxxx>
--
Florian