Re: [PATCH v2 2/2] PCI: pciehp: Abort hot-plug if pci_hp_add_bridge() fails

From: Nam Cao
Date: Tue May 07 2024 - 10:28:14 EST


On Mon, May 06, 2024 at 09:36:44PM +0200, Lukas Wunner wrote:
> On Mon, May 06, 2024 at 10:37:01AM +0200, Nam Cao wrote:
> > I just discovered a new crash scenario with shpchp:
> >
> > First, hot-add a bridge:
> > (qemu) device_add pci-bridge,id=br2,bus=br1,chassis_nr=1,addr=1
> >
> > But with the current patch, the hot-plug is still successful, just that the
> > bridge is not properly configured:
> > $ lspci -t
> > -[0000:00]-+-00.0
> > +-01.0
> > +-02.0
> > +-03.0-[01-02]----00.0-[02]----01.0-- <-- new hot-added bridge
> > +-1f.0
> > +-1f.2
> > \-1f.3
> >
> > But! Now I leave the hot-added bridge as is, and hot-add an ethernet card:
> > (qemu) device_add e1000,bus=br1,id=eth0,addr=2
> >
> > this command crashes the kernel (full crash log below).
> >
> > The problem is because during hot-plugging of this ethernet card,
> > pci_bus_add_devices() is invoked and the previously hot-plugged bridge hasn't
> > been marked as "added" yet, so the driver attempts to add this bridge
> > again, leading to the crash.
> >
> > Now for pciehp, this scenario cannot happen, because there is only a single
> > slot, so we cannot hot-plug another device. But still, this makes me think
> > perhaps we shouldn't leave the hot-plugged bridge in a improperly-configured
> > state as this patch is doing. I cannot think of a scenario that would crash pciehp
> > similarly to shpchp. But that's just me, maybe there is a rare scenario
> > that can crash pciehp, but I just haven't seen yet.
> >
> > My point is: better safe than sorry. I propose going back to the original
> > solution: calling pci_stop_and_remove_bus_device() and return a negative
> > error, and get rid of this improperly configured bridge. It is useless
> > anyways without a subordinate bus number.
> >
> > What do you think?
>
> When the kernel runs out of BAR address space for devices downstream of
> a bridge, it doesn't de-enumerate the bridge. The devices are just
> unusable.
>
> So my first intuition is that running out of bus numbers shouldn't result
> in de-enumeration either.

The BAR case is a bit different: it is a legal configuration for a bridge
to have no address space downstream: we can configure the bridge with
limit<base to indicate that there is no downstream addresses. There is
nothing similar for secondary bus number: there is no legal bus number
configuration for the bridge in this case.

> Remind me, how exactly does the NULL pointer deref occur? I think it's
> because no struct pci_bus was allocated for the subordinate bus of the
> hot-plugged bridge, right? Because AFAICS that would happen in
>
> pci_hp_add_bridge()
> pci_can_bridge_extend()
> pci_add_new_bus()
> pci_alloc_child_bus()
>
> but we never get that far because pci_hp_add_bridge() bails out with -1.
> So the subordinate pointer remains a NULL pointer.

This is correct. NULL deference happens due to subordinate pointer being
NULL.

> Perhaps we should allocate an empty struct pci_bus instead.

This feels a bit hacky. What bus number could we set to this struct? And I
doubt that the PCI subsystem is written with the expectation that struct pci_bus
can be a dummy one, so I would guess that the probability of breaking thing
is high with this approach.

> Or check for a NULL subordinate pointer instead of crashing.

I think this is a possible solution, but it is a bit complicated: all usage
of subordinate pointers will need to be looked at. Further more, secondary bus
number being zero is currently taken as unconfigured/invalid (for example
in pci_scan_bridge_extend()), that needs to be changed as well. Doing this
has a good chance of breaking things.

I don't really see the upside of the above, compared to just deleting this
bridge. It is not really counter-intuitive that the OS rejects a
hot-plugged device that cannot be configured, right? Also this solution is
wayyy simpler. Unless there is problem with this that I am not seeing?

Best regards,
Nam