Re: [PATCH V2] PCI/ASPM: Skip L1SS save/restore if not already enabled

From: Bjorn Helgaas
Date: Wed Feb 08 2023 - 18:42:39 EST


On Fri, Jan 20, 2023 at 02:45:40PM +0530, Vidya Sagar wrote:
> Skip save and restore of ASPM L1 Sub-States specific registers if they
> are not already enabled in the system. This is to avoid issues observed
> on certain platforms during restoration process, particularly when
> restoring the L1SS registers contents.
>
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216782
> Signed-off-by: Vidya Sagar <vidyas@xxxxxxxxxx>
> ---
> v2:
> * Address review comments from Kai-Heng Feng and Rafael
>
> drivers/pci/pcie/aspm.c | 17 ++++++++++++++++-
> include/linux/pci.h | 1 +
> 2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 53a1fa306e1e..bd2a922081bd 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -761,11 +761,23 @@ void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> {
> struct pci_cap_saved_state *save_state;
> u16 l1ss = dev->l1ss;
> - u32 *cap;
> + u32 *cap, val;
>
> if (!l1ss)
> return;
>
> + /*
> + * Skip save and restore of L1 Sub-States registers if they are not
> + * already enabled in the system
> + */
> + pci_read_config_dword(dev, l1ss + PCI_L1SS_CTL1, &val);
> + if (!(val & PCI_L1SS_CTL1_L1SS_MASK)) {
> + dev->skip_l1ss_restore = true;
> + return;
> + }

I think this fix is still problematic. PCIe r6.0, sec 5.5.4, requires
that

If setting either or both of the enable bits for ASPM L1 PM
Substates, both ports must be configured as described in this
section while ASPM L1 is disabled.

The current Linux code does not observe this because ASPM L1 is
enabled by PCI_EXP_LNKCTL (in the PCIe Capability Link Control
register), while ASPM L1 PM Substate configuration is in PCI_L1SS_CTL1
(in the L1 PM Substates Capability), and these two things are not
integrated:

pci_restore_state
pci_restore_aspm_l1ss_state
aspm_program_l1ss
pci_write_config_dword(PCI_L1SS_CTL1, ctl1) # L1SS restore
pci_restore_pcie_state
pcie_capability_write_word(PCI_EXP_LNKCTL, cap[i++]) # L1 restore

So I suspect the problem is that we're writing PCI_L1SS_CTL1 while
ASPM L1 is enabled, and the device gets confused somehow.

I think it would be better change this restore flow to follow that
spec requirement instead of skipping the save/restore like this.

I hesitate to even include the patch below because it's clearly not a
real fix, but if the system does resume and we see this message, it
would be a good clue that this is what's happening.

Bjorn

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 53a1fa306e1e..c8349b1f982f 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -779,7 +779,7 @@ void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
{
struct pci_cap_saved_state *save_state;
u32 *cap, ctl1, ctl2;
- u16 l1ss = dev->l1ss;
+ u16 ctl, l1ss = dev->l1ss;

if (!l1ss)
return;
@@ -788,6 +788,13 @@ void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
if (!save_state)
return;

+ pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &ctl);
+ if (ctl & PCI_EXP_LNKCTL_ASPM_L1) {
+ pci_info(dev, "ASPM: can't restore L1SS while L1 enabled (%#06x)\n",
+ ctl);
+ return;
+ }
+
cap = (u32 *)&save_state->cap.data[0];
ctl2 = *cap++;
ctl1 = *cap;