RE: [PATCH v2] PCI: Fix up L1SS capability for Intel Apollolake PCIe bridge
From: Lee, Ron
Date: Tue Dec 20 2022 - 04:11:46 EST
> On Fri, Dec 16, 2022 at 04:29:39PM +0000, Lee, Ron wrote:
> > > On Thu, Dec 15, 2022 at 05:13:57PM +0800, Ron Lee wrote:
> Even if you haven't seen a battery life issue, I suspect you might be
> able to measure a power consumption difference if you looked for it
> and likely could see issues with manual ASPM enable/disable using
> sysfs. That might be a legitimate reason for this quirk, and if it is, we should mention it here.
We can arrange the power measurement, but I doubt this quirk has correlation to power consumption.
My point is that the ASPM behavior is not changed with or without this quirk.
>
> > The following merged commit can save/restore the
> > L1SubCap/L1SubCtl1/L1SubCtl2 registers for this bridge, However this
> > bridge not only lost its capability contents but also lost the link
> > to this capability.
> >
> > PCI/ASPM: Save/restore L1SS Capability for suspend/resume
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/c
> > om
> >
> mit/drivers/pci/pcie/aspm.c?h=v6.1&id=4257f7e008ea394fcecc050f1569c350
> > 3b8bcc15
>
> The current version of that code:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tre
> e/driver
> s/pci/pcie/aspm.c?id=v6.1#n760
> doesn't search for the L1SS capability; it uses dev->l1ss just like
> your patch does. So it should restore the capability even though the linked list is broken.
Vidya's patch definitely can restore the L1SS capability, the dev->l1ss is still valid and
L1SubCtl1/L1SubCtl2 are restored after resume. With Vidya's V4 patch I think although
the L1SS capability is missing after resume , it didn't change the ASPM behavior,
the ASPM function may still remain normal and same as before suspend.
But Vidya's patch didn't restore L1SS capability header and the pointer to L1SS capability,
this is the purpose of this quirk, it just reflect the bridge's current L1SS status authentically.
i.e. the user and developer can see the L1 PM substates through lspci command.
For reference, the following is PCI config dump of this bridge between suspend/resume.
Before Suspend:
...
150: 00 00 00 20 00 04 00 00 00 00 00 00 00 00 00 00 <-- point to L1SS capability at offset 0x200
160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
200: 1e 00 01 22 1f 28 28 00 0f 28 03 60 f0 00 00 00 <-- here is L1SS capability
210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
220: 00 00 00 00 00 00 00 00 00 00 00 00 7f 7f 7f 7f
After Resume:
...
150: 00 00 00 00 00 04 00 00 00 00 00 00 00 00 00 00 <-- The pointer to L1SS capability was cleared
160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
1f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
200: 00 00 00 00 1f 28 28 00 0f 28 03 60 f0 00 00 00 <-- L1SS capability header was cleared too
>
> >
> > This issue could be and should be fixed by BIOS, however the
> > manufacturers have no resource for firmware validation and it's
> > risky for firmware update per their assessment.
>
> This fix is risky, too, because it writes to random places in config
> space and there's no guarantee that this is safe or even that the
> capabilities are at those locations.
Agree.
>
> > > Is there a bug report for this issue? Include the URL here.
> > >
> > > Is there a bug report for the firmware?
> > >
> > There is a Google's internal issue tracker for this bug, seems not
> > available for public.
>
> Maybe you can make a public report with any secret details removed?
> A simple email would be enough. I haven't seen the internal issue;
> hopefully it has more details than are in this patch.
Sorry, I forget Lukasz ever filed a bug report for this, please see
https://lore.kernel.org/linux-pci/CAFJ_xbq0cxcH-cgpXLU4Mjk30+muWyWm1aUZGK7iG53yaLBaQg@xxxxxxxxxxxxxx/T/#u
>
> > Actually this bug had a discussion on this thread, and Lukasz
> > Majczak identified this issue on Apollo Lake platform.
> > https://patchwork.kernel.org/project/linux-pci/patch/20220705060014.
> > 10
> > 050-1-vidyas@xxxxxxxxxx/
>
> That patch mentions Dell XPS 13, not a Chromebook, so your patch
> wouldn't affect it. Are you saying this issue is common across all Apollo Lake platforms?
> If so, maybe a fix should be more generic?
The history is
1. Apollo Lake's bridge DSP is connected to a WiFi card, the WiFi card crash while system resume due to Vidya's save/restore L1SS patch.
2. Vidya submitted V4 save/restore L1SS patch, and it work fine now.
3. However Lukasz found the bridge's L1SS capability still disappear even applying Vidya's V1-V4 patch
Allow me to excerpt one of their discussion,
Hi Vidya,
The results from my previous mail are for V3 of your patch;
Amberlake - works fine
Apollolake - still the same issue, but here it is not related to your
changes (we are still working on this).
Best regards,
Lukasz
>
> My point is that there is no PCI requirement for capabilities to be at
> 0x150 and 0x220. The only defined way to find these capabilities is
> to traverse the list starting at offset 0x100.
>
> The L1SS capability at pdev->l1ss is reasonable since we found it by
> traversing that list, but 0x150 and 0x220 are magic numbers with no
> justification. We have no reason to believe there are capabilities there.
>
I agree that, I ever try to recover the link by traversing list, but it didn't work and the capability list have no method to do reverse traversal.
One approach may save the whole capability list before suspend, and check each capability link then restore the missing one after resume.
Do you think it's practical ? It is appreciated if you could recommend a practical solution for this issue.
> We might know this based on device-specific knowledge of the Root
> Port. If that's the case, please cite the Intel spec for device 5ad6
> so we can tell this quirk can't be blindly applied to other Root Ports.
>
> > > > + /* Fix up the pointer to L1SS Capability*/
> > > > + pci_write_config_dword(pdev, 0x150, pdev->l1ss << 20); }
> > > > +DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x5ad6,
> > > > +chromeos_fixup_apl_bridge_l1ss_capability);
> > > > +#endif