Re: [PATCH] PCI/pwrctrl: Only destroy alongside host bridge
From: Brian Norris
Date: Wed Jul 16 2025 - 12:48:12 EST
On Wed, Jul 16, 2025 at 09:27:55PM +0530, Manivannan Sadhasivam wrote:
> On Tue, Jul 15, 2025 at 02:21:47PM GMT, Brian Norris wrote:
> > OTOH, I also see that part of my change is not really doing quite what I
> > thought it was -- so far, I think there may be some kind of resource
> > leak (kobj ref), since I'm not seeing pci_release_host_bridge_dev()
> > called when I think it should be. If I perform cleanup in
> > pci_free_host_bridge() instead, then I do indeed see
> > of_platform_device_destroy() tear things down the way I expect.
> >
>
> Oh, that's bad! Which controller it is? I played with making the pcie-qcom
> driver modular and I unloaded/loaded multiple times, but never saw any
> refcount warning (I really hope if there was any leak, it would've tripped over
> during insmod).
I'm still trying to tease this apart, and I'm not sure when I'll have
plenty of time to get further on this. I'm also primarily using a
non-upstream DWC-based driver, which isn't really ready to be published.
I also have some systems that use
drivers/pci/controller/pcie-rockchip-host.c and are fully
upstream-supported, so I'll see if I can replicate my observations
there.
But I think there are at least two problems:
(1) I'm adding code to bridge->dev.release(). release() is only called
when the device's refcount drops to zero. And child devices hold a
refcount on their parent (the bridge). So, I have a circular
refcount, if there were any pwrctrl children present.
I think this is easily solved by moving the child destruction to
pci_free_host_bridge() instead.
(2) Even after resolving 1, I'm seeing pci_free_host_bridge() exit with
a bridge->dev.kboj.kref refcount of 1 in some cases. I don't yet
have an explanation of that one.
IIUC, this kind of error would be considered a leak, but crucially, I
also don't think it would produce any kind of refcount warning or other
error. It's "just" a device that has been removed (a la, device_del()),
but still has some client holding a reference count (i.e., not enough
put_device()).
Brian