Re: [PATCH 2/4] PCI: pciehp: bail out if pci_hp_add_bridge() fails

From: Nam Cao
Date: Sat May 04 2024 - 05:35:50 EST


On Sat, May 04, 2024 at 10:54:15AM +0200, Lukas Wunner wrote:
> On Fri, May 03, 2024 at 09:23:20PM +0200, Nam Cao wrote:
> > If there is no bus number available for the downstream bus of the
> > hot-plugged bridge, pci_hp_add_bridge() will fail. The driver proceeds
> > regardless, and the kernel crashes.
> >
> > Abort if pci_hp_add_bridge() fails.
> [...]
> > --- a/drivers/pci/hotplug/pciehp_pci.c
> > +++ b/drivers/pci/hotplug/pciehp_pci.c
> > @@ -58,8 +58,13 @@ int pciehp_configure_device(struct controller *ctrl)
> > goto out;
> > }
> >
> > - for_each_pci_bridge(dev, parent)
> > - pci_hp_add_bridge(dev);
> > + for_each_pci_bridge(dev, parent) {
> > + if (pci_hp_add_bridge(dev)) {
> > + pci_stop_and_remove_bus_device(dev);
> > + ret = -EINVAL;
> > + goto out;
> > + }
> > + }
>
> Is the pci_stop_and_remove_bus_device() really necessary here?
> Why not just leave the bridge as is, without any child devices?

pci_stop_and_remove_bus_device() is not necessary to prevent kernel
crashing. But without this, we cannot hot-plug any other devices to this
slot afterward, despite the bridge has already been removed. Below is what
happens without pci_stop_and_remove_bus_device().

First, we hotplug a bridge. That fails, so QEMU removes this bridge:
(qemu) device_add pci-bridge,id=br2,bus=br1,chassis_nr=19,addr=1
[ 9.289609] shpchp 0000:01:00.0: Latch close on Slot(1-1)
[ 9.291145] shpchp 0000:01:00.0: Button pressed on Slot(1-1)
[ 9.292705] shpchp 0000:01:00.0: Card present on Slot(1-1)
[ 9.294369] shpchp 0000:01:00.0: PCI slot #1-1 - powering on due to button press
[ 15.529997] pci 0000:02:01.0: [1b36:0001] type 01 class 0x060400 conventional PCI bridge
[ 15.533907] pci 0000:02:01.0: BAR 0 [mem 0x00000000-0x000000ff 64bit]
[ 15.535802] pci 0000:02:01.0: PCI bridge to [bus 00]
[ 15.538519] pci 0000:02:01.0: bridge window [io 0x0000-0x0fff]
[ 15.540261] pci 0000:02:01.0: bridge window [mem 0x00000000-0x000fffff]
[ 15.543486] pci 0000:02:01.0: bridge window [mem 0x00000000-0x000fffff 64bit pref]
[ 15.547151] pci 0000:02:01.0: No bus number available for hot-added bridge
[ 15.549067] shpchp 0000:01:00.0: Cannot add device at 0000:02:01
[ 15.553104] shpchp 0000:01:00.0: Latch open on Slot(1-1)
[ 15.555246] shpchp 0000:01:00.0: Card not present on Slot(1-1)

Then, hot-plug an ethernet device. But the kernel still incorrectly
thought the bridge is still there, and refuses this new ethernet device:
(qemu) device_add e1000,bus=br1,addr=1
[ 58.163529] shpchp 0000:01:00.0: Latch close on Slot(1-1)
[ 58.165076] shpchp 0000:01:00.0: Button pressed on Slot(1-1)
[ 58.166650] shpchp 0000:01:00.0: Card present on Slot(1-1)
[ 58.168287] shpchp 0000:01:00.0: PCI slot #1-1 - powering on due to button press
[ 64.677492] shpchp 0000:01:00.0: Device 0000:02:01.0 already exists at 0000:02:01, cannot hot-add
[ 64.680007] shpchp 0000:01:00.0: Cannot add device at 0000:02:01
[ 64.682802] shpchp 0000:01:00.0: Latch open on Slot(1-1)
[ 64.684353] shpchp 0000:01:00.0: Card not present on Slot(1-1)

Best regards,
Nam