Re: Boot failure on gru-scarlet-inx with 5.9-rc2

From: Marc Zyngier
Date: Tue Sep 01 2020 - 11:39:37 EST


On 2020-09-01 04:45, Samuel Dionne-Riel wrote:
On Mon, 31 Aug 2020 10:27:37 +0100
Marc Zyngier <maz@xxxxxxxxxx> wrote:

Ah, so actually anything that *enables pcie* kills your system.
Great investigative work!

>
> And backed by a further bisection with this that points to
> d84c572de1a360501d2e439ac632126f5facf59d being the actual change
> that causes the tablet to fail to boot, as long as the pcie0 node is
> identified as pci properly.
>
> I am unsure if I should add as a Cc everyone involved in that change
> set, though the author (coincidentally) is already in the original
> list of recipients.

I've deliberately moved Rob from Cc to To... ;-)

Thanks, I don't actually know who to write to exactly.

Given that this is a PCI regression, I guess the PCI maintainers
are the likely victims. Adding Bjorn and Lorenzo to the list in
addition to Rob.

You can find the relevant people by looking at the MAINTAINERS
file.

> Any additional thoughts from this additional information?

What you could do is to start looking at which of the
pci_is_root_bus() changes breaks PCIe on this system. The fact that
it breaks on your system and not on mine is a bit puzzling.

Let me show you, on top of v5.9-rc3 I can successfully boot using this
partial revert / adaptation of d84c572d. In addition, it also allows
the Wi-Fi to work again, compared to how it didn't in 5.9-rc1 or
5.9-rc[23] with the dumb revert of your fix.

So, if we number each pci_is_root_bus by order appearance, it is only
the second use, in rockchip_pcie_valid_device, which seem to cause
scarlet not to boot.

The patch (not actually a patch submission) reverts only that instance
of pci_is_root_bus, while also doing some leg work to put back some
functionally equivalent code that was refactored away since.

If there's anything else you want me to try, don't hesitate.

---
drivers/pci/controller/pcie-rockchip-host.c | 8 +++++++-
drivers/pci/controller/pcie-rockchip.h | 1 +
2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/controller/pcie-rockchip-host.c
b/drivers/pci/controller/pcie-rockchip-host.c
index 0bb2fb3e8a0b..5a27fa833fbd 100644
--- a/drivers/pci/controller/pcie-rockchip-host.c
+++ b/drivers/pci/controller/pcie-rockchip-host.c
@@ -79,7 +79,7 @@ static int rockchip_pcie_valid_device(struct
rockchip_pcie *rockchip,
* do not read more than one device on the bus directly attached
* to RC's downstream side.
*/
- if (pci_is_root_bus(bus->parent) && dev > 0)
+ if (bus->primary == rockchip->root_bus_nr && dev > 0)
return 0;

return 1;
@@ -944,6 +944,7 @@ static int rockchip_pcie_probe(struct platform_device *pdev)
struct rockchip_pcie *rockchip;
struct device *dev = &pdev->dev;
struct pci_host_bridge *bridge;
+ struct resource *bus_res;
int err;

if (!dev->of_node)
@@ -983,6 +984,11 @@ static int rockchip_pcie_probe(struct
platform_device *pdev)
if (err < 0)
goto err_deinit_port;

+ /* HACK; ~equiv to last param of pci_parse_request_of_pci_ranges */
+ bus_res = (resource_list_first_type(&bridge->windows, IORESOURCE_MEM))->res;
+
+ rockchip->root_bus_nr = bus_res->start;
+
err = rockchip_pcie_cfg_atu(rockchip);
if (err)
goto err_remove_irq_domain;
diff --git a/drivers/pci/controller/pcie-rockchip.h
b/drivers/pci/controller/pcie-rockchip.h
index c7d0178fc8c2..0952fec7e34d 100644
--- a/drivers/pci/controller/pcie-rockchip.h
+++ b/drivers/pci/controller/pcie-rockchip.h
@@ -298,6 +298,7 @@ struct rockchip_pcie {
struct gpio_desc *ep_gpio;
u32 lanes;
u8 lanes_map;
+ u8 root_bus_nr;
int link_gen;
struct device *dev;
struct irq_domain *irq_domain;
--
2.25.4


Thanks again!

Hmmm. It seems that the original commit (d84c572d) considered that
root_bus_nr was always zero, while it may not have been.

Rob, Lorenzo: do you guys have any idea what is going on here?

Thanks,

M.
--
Jazz is not dead. It just smells funny...