Re: [PATCH v5 2/3] PCI: qcom: Restrict pci transactions after pci suspend

From: Krishna Chaitanya Chundru
Date: Thu Aug 25 2022 - 09:53:27 EST



On 8/24/2022 10:50 PM, Stephen Boyd wrote:
Quoting Krishna Chaitanya Chundru (2022-08-23 20:37:59)
On 8/9/2022 12:42 AM, Stephen Boyd wrote:
Quoting Krishna chaitanya chundru (2022-08-03 04:28:53)
If the endpoint device state is D0 and irq's are not freed, then
kernel try to mask interrupts in system suspend path by writing
in to the vector table (for MSIX interrupts) and config space (for MSI's).

These transactions are initiated in the pm suspend after pcie clocks got
disabled as part of platform driver pm suspend call. Due to it, these
transactions are resulting in un-clocked access and eventually to crashes.
Why are the platform driver pm suspend calls disabling clks that early?
Can they disable clks in noirq phase, or even later, so that we don't
have to check if the device is clocking in the irq poking functions?
It's best to keep irq operations fast, so that irq control is fast given
that these functions are called from irq flow handlers.
We are registering the pcie pm suspend ops as noirq ops only. And this
msix and config

access is coming at the later point of time that is reason we added that
check.

What is accessing msix and config? Can you dump_stack() after noirq ops
are called and figure out what is trying to access the bus when it is
powered down?

The msix and config space is being accessed to mask interrupts. The access is coming at the end of the suspend

and near CPU disable. We tried to dump the stack there but the call stack is not coming as it is near cpu disable.

But we got dump at resume please have look at it

[   54.946268] Enabling non-boot CPUs ...
[   54.951182] CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 5.15.41 #105 43491e4414b1db8a6f59d56b617b520d92a9498e
[   54.961122] Hardware name: Qualcomm Technologies, Inc. sc7280 IDP SKU2 platform (DT)
[   54.969088] Call trace:
[   54.971612]  dump_backtrace+0x0/0x200
[   54.975399]  show_stack+0x20/0x2c
[   54.978826]  dump_stack_lvl+0x6c/0x90
[   54.982614]  dump_stack+0x18/0x38
[   54.986043]  dw_msi_unmask_irq+0x2c/0x58
[   54.990096]  irq_enable+0x58/0x90
[   54.993522]  __irq_startup+0x68/0x94
[   54.997216]  irq_startup+0xf4/0x140
[   55.000820]  irq_affinity_online_cpu+0xc8/0x154
[   55.005491]  cpuhp_invoke_callback+0x19c/0x6e4
[   55.010077]  cpuhp_thread_fun+0x11c/0x188
[   55.014216]  smpboot_thread_fn+0x1ac/0x30c
[   55.018445]  kthread+0x140/0x30c
[   55.021788]  ret_from_fork+0x10/0x20
[   55.028243] CPU1 is up

So the same stack should be called at the suspend path while disabling CPU.

If there is any other way to remove these calls can you please help us point that way.

Thanks & Regards,
Krishna Chaitanya