Re: [PATCH v4 02/11] PCI/bwctrl: Add support to scale bandwidth before & after link re-training

From: Manivannan Sadhasivam
Date: Tue Aug 12 2025 - 12:45:33 EST


On Tue, Aug 12, 2025 at 09:35:46AM GMT, Krishna Chaitanya Chundru wrote:
>
>
> On 7/22/2025 4:33 PM, Krishna Chaitanya Chundru wrote:
> >
> >
> > On 7/12/2025 4:36 AM, Krishna Chaitanya Chundru wrote:
> > >
> > >
> > > On 7/12/2025 3:06 AM, Bjorn Helgaas wrote:
> > > > On Mon, Jun 09, 2025 at 04:21:23PM +0530, Krishna Chaitanya
> > > > Chundru wrote:
> > > > > If the driver wants to move to higher data rate/speed than
> > > > > the current data
> > > > > rate then the controller driver may need to change certain
> > > > > votes so that
> > > > > link may come up at requested data rate/speed like QCOM PCIe
> > > > > controllers
> > > > > need to change their RPMh (Resource Power Manager-hardened) state. Once
> > > > > link retraining is done controller drivers needs to adjust their votes
> > > > > based on the final data rate.
> > > > >
> > > > > Some controllers also may need to update their bandwidth voting like
> > > > > ICC BW votings etc.
> > > > >
> > > > > So, add pre_link_speed_change() & post_link_speed_change() op to call
> > > > > before & after the link re-train. There is no explicit
> > > > > locking mechanisms
> > > > > as these are called by a single client Endpoint driver.
> > > > >
> > > > > In case of PCIe switch, if there is a request to change
> > > > > target speed for a
> > > > > downstream port then no need to call these function ops as these are
> > > > > outside the scope of the controller drivers.
> > > >
> > > > > +++ b/include/linux/pci.h
> > > > > @@ -599,6 +599,24 @@ struct pci_host_bridge {
> > > > >       void (*release_fn)(struct pci_host_bridge *);
> > > > >       int (*enable_device)(struct pci_host_bridge *bridge,
> > > > > struct pci_dev *dev);
> > > > >       void (*disable_device)(struct pci_host_bridge *bridge,
> > > > > struct pci_dev *dev);
> > > > > +    /*
> > > > > +     * Callback to the host bridge drivers to update ICC BW
> > > > > votes, clock
> > > > > +     * frequencies etc.. for the link re-train to come up
> > > > > in targeted speed.
> > > > > +     * These are intended to be called by devices directly
> > > > > attached to the
> > > > > +     * Root Port. These are called by a single client
> > > > > Endpoint driver, so
> > > > > +     * there is no need for explicit locking mechanisms.
> > > > > +     */
> > > > > +    int (*pre_link_speed_change)(struct pci_host_bridge *bridge,
> > > > > +                     struct pci_dev *dev, int speed);
> > > > > +    /*
> > > > > +     * Callback to the host bridge drivers to adjust ICC BW
> > > > > votes, clock
> > > > > +     * frequencies etc.. to the updated speed after link
> > > > > re-train. These
> > > > > +     * are intended to be called by devices directly attached to the
> > > > > +     * Root Port. These are called by a single client Endpoint driver,
> > > > > +     * so there is no need for explicit locking mechanisms.
> > > >
> > > > No need to repeat the entire comment.  s/.././
> > > >
> > > > These pointers feel awfully specific for being in struct
> > > > pci_host_bridge, since we only need them for a questionable QCOM
> > > > controller.  I think this needs to be pushed down into qcom somehow as
> > > > some kind of quirk.
> > > >
> > > Currently these are needed by QCOM controllers, but it may also needed
> > > by other controllers may also need these for updating ICC votes, any
> > > system level votes, clock frequencies etc.
> > > QCOM controllers is also doing one extra step in these functions to
> > > disable and enable ASPM only as it cannot link speed change support
> > > with ASPM enabled.
> > >
> > Bjorn, can you check this.
> >
> > For QCOM devices we need to update the RPMh vote i.e a power source
> > votes for the link to come up in required speed. and also we need
> > to update interconnect votes also. This will be applicable for
> > other vendors also.
> >
> > If this is not correct place I can add them in the pci_ops.
> Bjorn,
>
> Can you please comment on this.
>
> Is this fine to move these to the pci_ops of the bridge.
> Again these are not specific to QCOM, any controller driver which
> needs to change their clock rates, ICC bw votes etc needs to have
> these.
>

No, moving to 'pci_ops' is terrible than having it in 'pci_host_bridge' IMO. If
we want to get rid of these ops, we can introduce a quirk flag in
'pci_host_bridge' and when set, the bwctrl code can disable/enable ASPM
before/after link retrain. This clearly states that the controller is quirky and
we need to disable/enable ASPM.

For setting OPP, you can have a separate callback in 'pci_host_bridge' that just
allows setting OPP *after* retrain, like 'pci_host_bridge:link_change_notify()'.
I don't think you really need to set OPP before retrain though. As even if you
do it pre and post link retrain, there is still a small window where the link
will operate without adequate vote.

- Mani

--
மணிவண்ணன் சதாசிவம்