Re: [PATCH v2 2/6] pci/hotplug/pnv_php: Work around switches with broken
From: Timothy Pearson
Date: Thu Jun 19 2025 - 15:29:53 EST
----- Original Message -----
> From: "Bjorn Helgaas" <helgaas@xxxxxxxxxx>
> To: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx>
> Cc: "linuxppc-dev" <linuxppc-dev@xxxxxxxxxxxxxxxx>, "linux-kernel" <linux-kernel@xxxxxxxxxxxxxxx>, "linux-pci"
> <linux-pci@xxxxxxxxxxxxxxx>, "Madhavan Srinivasan" <maddy@xxxxxxxxxxxxx>, "Michael Ellerman" <mpe@xxxxxxxxxxxxxx>,
> "christophe leroy" <christophe.leroy@xxxxxxxxxx>, "Naveen N Rao" <naveen@xxxxxxxxxx>, "Bjorn Helgaas"
> <bhelgaas@xxxxxxxxxx>, "Shawn Anastasio" <sanastasio@xxxxxxxxxxxxxxxxxxxxx>, "Lukas Wunner" <lukas@xxxxxxxxx>
> Sent: Wednesday, June 18, 2025 3:17:22 PM
> Subject: Re: [PATCH v2 2/6] pci/hotplug/pnv_php: Work around switches with broken
> On Wed, Jun 18, 2025 at 02:50:04PM -0500, Timothy Pearson wrote:
>> ----- Original Message -----
>> > From: "Bjorn Helgaas" <helgaas@xxxxxxxxxx>
>> > To: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx>
>> > Cc: "linuxppc-dev" <linuxppc-dev@xxxxxxxxxxxxxxxx>, "linux-kernel"
>> > <linux-kernel@xxxxxxxxxxxxxxx>, "linux-pci"
>> > <linux-pci@xxxxxxxxxxxxxxx>, "Madhavan Srinivasan" <maddy@xxxxxxxxxxxxx>,
>> > "Michael Ellerman" <mpe@xxxxxxxxxxxxxx>,
>> > "christophe leroy" <christophe.leroy@xxxxxxxxxx>, "Naveen N Rao"
>> > <naveen@xxxxxxxxxx>, "Bjorn Helgaas"
>> > <bhelgaas@xxxxxxxxxx>, "Shawn Anastasio" <sanastasio@xxxxxxxxxxxxxxxxxxxxx>,
>> > "Lukas Wunner" <lukas@xxxxxxxxx>
>> > Sent: Wednesday, June 18, 2025 2:44:00 PM
>> > Subject: Re: [PATCH v2 2/6] pci/hotplug/pnv_php: Work around switches with
>> > broken
>>
>> > [+cc Lukas, pciehp expert]
>> >
>> > On Wed, Jun 18, 2025 at 11:56:54AM -0500, Timothy Pearson wrote:
>> >> presence detection
>> >
>> > (subject/commit wrapping seems to be on all of these patches)
>> >
>> >> The Microsemi Switchtec PM8533 PFX 48xG3 [11f8:8533] PCIe switch system
>> >> was observed to incorrectly assert the Presence Detect Set bit in its
>> >> capabilities when tested on a Raptor Computing Systems Blackbird system,
>> >> resulting in the hot insert path never attempting a rescan of the bus
>> >> and any downstream devices not being re-detected.
>> >
>> > Seems like this switch supports standard PCIe hotplug? Quite a bit of
>> > this driver looks similar to things in pciehp. Is there some reason
>> > we can't use pciehp directly? Maybe pciehp could work if there were
>> > hooks for the PPC-specific bits?
>>
>> While that is a good long term goal that Raptor is willing to work
>> toward, it is non-trivial and will require buy-in from other
>> stakeholders (e.g. IBM). If practical, I'd like to get this series
>> merged first, to fix the broken hotplug on our hardware that is
>> deployed worldwide, then in parallel see what can be done to merge
>> PowerNV support into pciehp. Would that work?
>
> Yeah, it wouldn't make sense to switch horses at this stage.
>
> I guess I was triggered by this patch, which seems to be a workaround
> for a defect in a device that is probably also used on non-PPC
> systems, and pciehp would need a similar workaround. But I guess you
> go on to say that pciehp already does something similar, so it guess
> it's already covered.
No problem, I completely understand. To be perfectly frank the existing code quality in this driver (and the associated EEH driver) is not the best, and it's been a frustrating experience trying to hack it into semi-stable operation. I would vastly prefer to rewrite / integrate into the pciehp driver, and we have plans to do so, but that will take an unacceptable amount of time vs. trying to fix up the existing driver as a stopgap.
As you mentioned, pciehp already has this fix, so we just have to deal with the duplicated code until we (Raptor) figures out how to merge PowerNV support into pciehp.